Re: [HACKERS] Bogus WAL segments archived after promotion
On Mon, Apr 13, 2015 at 11:57 PM, Heikki Linnakangas hlinn...@iki.fi wrote: On 04/01/2015 07:12 PM, Bruce Momjian wrote: On Fri, Dec 19, 2014 at 10:26:34PM +0200, Heikki Linnakangas wrote: On 12/19/2014 02:55 PM, Heikki Linnakangas wrote: I'm thinking that we should add a step to promotion, where we scan pg_xlog for any segments higher than the timeline switch point, and remove them, or mark them with .done so that they are not archived. There might be some real WAL that was streamed from the primary, but not yet applied, but such WAL is of no interest to that server anyway, after it's been promoted. It's a bit disconcerting to zap WAL that's valid, even if doesn't belong to the current server's timeline history, because as a general rule it's good to avoid destroying evidence that might be useful in debugging. There isn't much difference between removing them immediately and marking them as .done, though, because they will eventually be removed/recycled anyway if they're marked as .done. This is what I came up with. This patch removes the suspect segments at timeline switch. The alternative of creating .done files for them would preserve more evidence for debugging, but OTOH it would also be very confusing to have valid-looking WAL segments in pg_xlog, with .done files, that in fact contain garbage. The patch is a bit longer than it otherwise would be, because I moved the code to remove a single file from RemoveOldXlogFiles() to a new function. I think that makes it more readable in any case, simply because it was so deeply indented in RemoveOldXlogFiles. Where are we on this? I didn't hear any better ideas, so committed this now. Finally looking at that... The commit log of b2a5545 is a bit misleading. Segment files that were recycled during archive recovery are not necessarily removed, they could be recycled as well during promotion on the new timeline in line with what RemoveOldXlogFiles() does. Hence I think that the comment on top of RemoveNonParentXlogFiles() should be updated to reflect that like in the patch attached. Something minor: perhaps we could refactor xlogarchive.c to have XLogArchiveCheckDone() and XLogArchiveIsBusy() use the new XLogArchiveIsReady(). Regards, -- Michael diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 2580996..e0a8b81 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -3596,7 +3596,8 @@ RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr PriorRedoPtr, XLogRecPtr endptr) } /* - * Remove WAL files that are not part of the given timeline's history. + * Recycle or remove WAL files that are not part of the given timeline's + * history. * * This is called during recovery, whenever we switch to follow a new * timeline, and at the end of recovery when we create a new timeline. We -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bogus WAL segments archived after promotion
On 04/01/2015 07:12 PM, Bruce Momjian wrote: On Fri, Dec 19, 2014 at 10:26:34PM +0200, Heikki Linnakangas wrote: On 12/19/2014 02:55 PM, Heikki Linnakangas wrote: I'm thinking that we should add a step to promotion, where we scan pg_xlog for any segments higher than the timeline switch point, and remove them, or mark them with .done so that they are not archived. There might be some real WAL that was streamed from the primary, but not yet applied, but such WAL is of no interest to that server anyway, after it's been promoted. It's a bit disconcerting to zap WAL that's valid, even if doesn't belong to the current server's timeline history, because as a general rule it's good to avoid destroying evidence that might be useful in debugging. There isn't much difference between removing them immediately and marking them as .done, though, because they will eventually be removed/recycled anyway if they're marked as .done. This is what I came up with. This patch removes the suspect segments at timeline switch. The alternative of creating .done files for them would preserve more evidence for debugging, but OTOH it would also be very confusing to have valid-looking WAL segments in pg_xlog, with .done files, that in fact contain garbage. The patch is a bit longer than it otherwise would be, because I moved the code to remove a single file from RemoveOldXlogFiles() to a new function. I think that makes it more readable in any case, simply because it was so deeply indented in RemoveOldXlogFiles. Where are we on this? I didn't hear any better ideas, so committed this now. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bogus WAL segments archived after promotion
On Fri, Dec 19, 2014 at 10:26:34PM +0200, Heikki Linnakangas wrote: On 12/19/2014 02:55 PM, Heikki Linnakangas wrote: I'm thinking that we should add a step to promotion, where we scan pg_xlog for any segments higher than the timeline switch point, and remove them, or mark them with .done so that they are not archived. There might be some real WAL that was streamed from the primary, but not yet applied, but such WAL is of no interest to that server anyway, after it's been promoted. It's a bit disconcerting to zap WAL that's valid, even if doesn't belong to the current server's timeline history, because as a general rule it's good to avoid destroying evidence that might be useful in debugging. There isn't much difference between removing them immediately and marking them as .done, though, because they will eventually be removed/recycled anyway if they're marked as .done. This is what I came up with. This patch removes the suspect segments at timeline switch. The alternative of creating .done files for them would preserve more evidence for debugging, but OTOH it would also be very confusing to have valid-looking WAL segments in pg_xlog, with .done files, that in fact contain garbage. The patch is a bit longer than it otherwise would be, because I moved the code to remove a single file from RemoveOldXlogFiles() to a new function. I think that makes it more readable in any case, simply because it was so deeply indented in RemoveOldXlogFiles. Where are we on this? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Bogus WAL segments archived after promotion
When streaming replication was introduced in 9.0, we started to recycle old WAL segments in archive recovery, like we do during normal operation. The WAL segments are recycled on the current timeline. There is no guarantee that they are useful, if the current timeline changes, because we step to recover another timeline after that, or the standby is promoted, but that was thought to be harmless. However, consider what happens after a server is promoted, and WAL archiving is enabled. The server's pg_xlog directory will look something like this: -rw--- 1 heikki heikki 16777216 Dec 19 14:22 00010005 -rw--- 1 heikki heikki 16777216 Dec 19 14:23 00010006 -rw--- 1 heikki heikki 16777216 Dec 19 14:23 00010007 -rw--- 1 heikki heikki 16777216 Dec 19 14:23 00010008 -rw--- 1 heikki heikki 16777216 Dec 19 14:23 00010009 -rw--- 1 heikki heikki 16777216 Dec 19 14:23 0001000A -rw--- 1 heikki heikki 16777216 Dec 19 14:23 0001000B -rw--- 1 heikki heikki 16777216 Dec 19 14:23 0001000C -rw--- 1 heikki heikki 16777216 Dec 19 14:23 0001000D -rw--- 1 heikki heikki 16777216 Dec 19 14:23 0001000E -rw--- 1 heikki heikki 16777216 Dec 19 14:23 0001000F -rw--- 1 heikki heikki 16777216 Dec 19 14:23 00010010 -rw--- 1 heikki heikki 16777216 Dec 19 14:23 00010011 -rw--- 1 heikki heikki 16777216 Dec 19 14:23 00010012 -rw--- 1 heikki heikki 16777216 Dec 19 14:23 00010013 -rw--- 1 heikki heikki 16777216 Dec 19 14:23 00010014 -rw--- 1 heikki heikki 16777216 Dec 19 14:23 00010015 -rw--- 1 heikki heikki 16777216 Dec 19 14:23 00010016 -rw--- 1 heikki heikki 16777216 Dec 19 14:23 00010017 -rw--- 1 heikki heikki 16777216 Dec 19 14:23 00010018 -rw--- 1 heikki heikki 16777216 Dec 19 14:24 00010019 -rw--- 1 heikki heikki 16777216 Dec 19 14:22 0001001A -rw--- 1 heikki heikki 16777216 Dec 19 14:22 0001001B -rw--- 1 heikki heikki 16777216 Dec 19 14:22 0001001C -rw--- 1 heikki heikki 16777216 Dec 19 14:24 00020019 -rw--- 1 heikki heikki 16777216 Dec 19 14:24 0002001A -rw--- 1 heikki heikki 42 Dec 19 14:24 0002.history The files on timeline 1, up to 00010019, are valid segments, streamed from the primary or restored from the WAL archive. The segments 0001001A and 0001001B are recycled segments that haven't been reused yet. Their contents are not valid (they contain records from some earlier point in WAL, but it might as well be garbage). The server was promoted within the segment 19, and a new timeline was started. Segments 00020019 and 0002001A contain valid WAL on the new timeline. Now, after enough time passes that the bogus 0001001A and 0001001B segments become old enough to be recycled, the system will see that there is no .ready or .done file for them, and will create .ready files so that they are archived. And they are archived. That's bogus, because the files are bogus. Worse, if the primary server where this server was forked off from continues running, and creates the genuine 0001001A and 0001001B segments, it can fail to archive them if the standby had already archived the bogus segments with the same names. We must somehow prevent the recycled, but not yet used, segments from being archived. One idea is to not create them in the first place, i.e. don't recycle old segments during recovery, just delete them and have new ones be created on demand. That's simple, but would hurt performance. I'm thinking that we should add a step to promotion, where we scan pg_xlog for any segments higher than the timeline switch point, and remove them, or mark them with .done so that they are not archived. There might be some real WAL that was streamed from the primary, but not yet applied, but such WAL is of no interest to that server anyway, after it's been promoted. It's a bit disconcerting to zap WAL that's valid, even if doesn't belong to the current server's timeline history, because as a general rule it's good to avoid destroying evidence that might be useful in debugging. There isn't much difference between removing them immediately and marking them as .done, though, because they will eventually be removed/recycled anyway if they're marked as .done. The archival behaviour at promotion is a bit inconsistent and weird anyway; even valid, streamed WAL is marked as .done and not archived anyway, except for the last partial segment. We're
Re: [HACKERS] Bogus WAL segments archived after promotion
On 12/19/2014 02:55 PM, Heikki Linnakangas wrote: I'm thinking that we should add a step to promotion, where we scan pg_xlog for any segments higher than the timeline switch point, and remove them, or mark them with .done so that they are not archived. There might be some real WAL that was streamed from the primary, but not yet applied, but such WAL is of no interest to that server anyway, after it's been promoted. It's a bit disconcerting to zap WAL that's valid, even if doesn't belong to the current server's timeline history, because as a general rule it's good to avoid destroying evidence that might be useful in debugging. There isn't much difference between removing them immediately and marking them as .done, though, because they will eventually be removed/recycled anyway if they're marked as .done. This is what I came up with. This patch removes the suspect segments at timeline switch. The alternative of creating .done files for them would preserve more evidence for debugging, but OTOH it would also be very confusing to have valid-looking WAL segments in pg_xlog, with .done files, that in fact contain garbage. The patch is a bit longer than it otherwise would be, because I moved the code to remove a single file from RemoveOldXlogFiles() to a new function. I think that makes it more readable in any case, simply because it was so deeply indented in RemoveOldXlogFiles. Thoughts? - Heikki commit 9e5c1033fbec2ebe553d68a1a7e343c8779109d0 Author: Heikki Linnakangas heikki.linnakan...@iki.fi Date: Fri Dec 19 21:12:40 2014 +0200 Don't archive bogus recycled or preallocated files after timeline switch. After a timeline switch, either because we're following a timeline switch during replay, or because end of recovery, we would leave behind recycled WAL segments that are in the future, but on the old timeline. After promotion, and after they become old enough to be recycled again, we would notice that they don't have a .ready or .done file, create a .ready file for them, and archive them. That's bogus, because the files contain garbage, recycled from an older timeline (or prealloced as zeros). We shouldn't archive such files. To fix, whenever we switch to a new timeline, scan the data directory for WAL segments on the old timeline, but with a higher segment number, and remove them. Those don't belong to our timeline history, and are most likely bogus recycled or preallocated files. They could also be valid files that we streamed from the primary ahead of time, but in any case, they're not needed to recover to the new timeline. diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 9e45976..e267ca1 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -792,6 +792,8 @@ static int emode_for_corrupt_record(int emode, XLogRecPtr RecPtr); static void XLogFileClose(void); static void PreallocXlogFiles(XLogRecPtr endptr); static void RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr endptr); +static void RemoveXlogFile(const char *segname, XLogRecPtr endptr); +static void RemoveNonParentXlogFiles(XLogRecPtr switchpoint, TimeLineID newTLI); static void UpdateLastRemovedPtr(char *filename); static void ValidateXLOGDirectoryStructure(void); static void CleanupBackupHistory(void); @@ -3431,24 +3433,9 @@ UpdateLastRemovedPtr(char *filename) static void RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr endptr) { - XLogSegNo endlogSegNo; - int max_advance; DIR *xldir; struct dirent *xlde; char lastoff[MAXFNAMELEN]; - char path[MAXPGPATH]; - -#ifdef WIN32 - char newpath[MAXPGPATH]; -#endif - struct stat statbuf; - - /* - * Initialize info about where to try to recycle to. We allow recycling - * segments up to XLOGfileslop segments beyond the current XLOG location. - */ - XLByteToPrevSeg(endptr, endlogSegNo); - max_advance = XLOGfileslop; xldir = AllocateDir(XLOGDIR); if (xldir == NULL) @@ -3469,6 +3456,11 @@ RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr endptr) while ((xlde = ReadDir(xldir, XLOGDIR)) != NULL) { + /* Ignore files that are not XLOG segments */ + if (strlen(xlde-d_name) != 24 || + strspn(xlde-d_name, 0123456789ABCDEF) != 24) + continue; + /* * We ignore the timeline part of the XLOG segment identifiers in * deciding whether a segment is still needed. This ensures that we @@ -3480,92 +3472,110 @@ RemoveOldXlogFiles(XLogSegNo segno, XLogRecPtr endptr) * We use the alphanumeric sorting property of the filenames to decide * which ones are earlier than the lastoff segment. */ - if (strlen(xlde-d_name) == 24 - strspn(xlde-d_name, 0123456789ABCDEF) == 24 - strcmp(xlde-d_name + 8, lastoff + 8) = 0) + if (strcmp(xlde-d_name + 8, lastoff + 8) = 0) { if (XLogArchiveCheckDone(xlde-d_name)) { -snprintf(path, MAXPGPATH, XLOGDIR /%s, xlde-d_name); - /* Update the last removed