Re: Change pgarch_readyXlog() to return .history files first
On 12/24/18 1:31 PM, Michael Paquier wrote: On Sat, Dec 22, 2018 at 08:55:14AM +0900, Michael Paquier wrote: Thanks for the lookups. I can see that the patch applies without conflicts down to 9.4, and based on the opinions gathered on this thread back-patching this stuff is the consensus, based on input from Kyotaro Horiguchi and David Steele (I don't mind much myself). So, any objections from others in doing so? On REL9_4_STABLE, IsTLHistoryFileName() goes missing, so committed "only" down to 9.5. Thanks David for the patch and Horiguchi-san for the review. Thanks, Michael! I'm not too worried about 9.4 since it is currently the oldest supported version. HA users tend to be on the leading edge of the upgrade curve and others have the opportunity to upgrade if the reordering will help them. -- -David da...@pgmasters.net
Re: Change pgarch_readyXlog() to return .history files first
On Sat, Dec 22, 2018 at 08:55:14AM +0900, Michael Paquier wrote: > Thanks for the lookups. I can see that the patch applies without > conflicts down to 9.4, and based on the opinions gathered on this > thread back-patching this stuff is the consensus, based on input from > Kyotaro Horiguchi and David Steele (I don't mind much myself). So, > any objections from others in doing so? On REL9_4_STABLE, IsTLHistoryFileName() goes missing, so committed "only" down to 9.5. Thanks David for the patch and Horiguchi-san for the review. -- Michael signature.asc Description: PGP signature
Re: Change pgarch_readyXlog() to return .history files first
On Fri, Dec 21, 2018 at 08:17:12AM +0200, David Steele wrote: > I thought about doing that, but wanted to focus on the task at hand. It > does save a strcpy and a bit of stack space, so seems like a win. > > Overall, the patch looks good to me. I think breaking up the if does make > the code more readable. Thanks for the lookups. I can see that the patch applies without conflicts down to 9.4, and based on the opinions gathered on this thread back-patching this stuff is the consensus, based on input from Kyotaro Horiguchi and David Steele (I don't mind much myself). So, any objections from others in doing so? -- Michael signature.asc Description: PGP signature
Re: Change pgarch_readyXlog() to return .history files first
On 12/21/18 6:49 AM, Kyotaro HORIGUCHI wrote: "else if (!historyFound || ishistory)" strcpy(xlog, newxlog); The caller prepares sufficient memory for basename, and we no longer copy ".ready" into newxlog. Douldn't we work directly on xlog instead of allocating newxlog? I thought about doing that, but wanted to focus on the task at hand. It does save a strcpy and a bit of stack space, so seems like a win. Overall, the patch looks good to me. I think breaking up the if does make the code more readable. Regards, -- -David da...@pgmasters.net
Re: Change pgarch_readyXlog() to return .history files first
At Fri, 21 Dec 2018 14:17:25 +0900, Michael Paquier wrote in <20181221051724.gg1...@paquier.xyz> > On Fri, Dec 21, 2018 at 01:49:18PM +0900, Kyotaro HORIGUCHI wrote: > > FWIW it seems to me a bug that making an inconsistent set of > > files in archive directory. > > Okay, point taken! FWIW, I have no actual objections in not > back-patching that. I maybe(?) know. > > At Fri, 21 Dec 2018 12:19:21 +0900, Michael Paquier > > wrote in <20181221031921.ge1...@paquier.xyz> > >> On Thu, Dec 20, 2018 at 01:57:30PM +0200, David Steele wrote: > >> > Good point. The new patch uses IsTLHistoryFileName(). > >> > >> OK, I have been reviewing the patch and the logic is correct, still I > >> could not resist reducing the number of inner if's for readability. I > > > > +1 basically. But I think that tail(name, 6) != ".ready" can > > happen with a certain frequency but strspn(name) < basenamelen > > rarely in the normal case. > > So that +0.5 if "basically" means a partial agreement? :p Mmm. No, +0.9. > > In the else if condition, ishisotry must be false in the right > > hand of ||. What we do here is ignoring non-history files once > > history file found. (Just a logic condensing and it would be done > > by compiler, though) > > Yes, this can be simplified. So let's do so. > > > The caller prepares sufficient memory for basename, and we no > > longer copy ".ready" into newxlog. Wouldn't we work directly on Sorry for silly typo, but the 'W' was 'C', I meant:p > > xlog instead of allocating newxlog? > > Okay, let's simplify that as you suggest. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Change pgarch_readyXlog() to return .history files first
On Fri, Dec 21, 2018 at 01:49:18PM +0900, Kyotaro HORIGUCHI wrote: > FWIW it seems to me a bug that making an inconsistent set of > files in archive directory. Okay, point taken! FWIW, I have no actual objections in not back-patching that. > At Fri, 21 Dec 2018 12:19:21 +0900, Michael Paquier > wrote in <20181221031921.ge1...@paquier.xyz> >> On Thu, Dec 20, 2018 at 01:57:30PM +0200, David Steele wrote: >> > Good point. The new patch uses IsTLHistoryFileName(). >> >> OK, I have been reviewing the patch and the logic is correct, still I >> could not resist reducing the number of inner if's for readability. I > > +1 basically. But I think that tail(name, 6) != ".ready" can > happen with a certain frequency but strspn(name) < basenamelen > rarely in the normal case. So that +0.5 if "basically" means a partial agreement? :p > In the else if condition, ishisotry must be false in the right > hand of ||. What we do here is ignoring non-history files once > history file found. (Just a logic condensing and it would be done > by compiler, though) Yes, this can be simplified. So let's do so. > The caller prepares sufficient memory for basename, and we no > longer copy ".ready" into newxlog. Wouldn't we work directly on > xlog instead of allocating newxlog? Okay, let's simplify that as you suggest. -- Michael signature.asc Description: PGP signature
Re: Change pgarch_readyXlog() to return .history files first
Hello. FWIW it seems to me a bug that making an inconsistent set of files in archive directory. At Fri, 21 Dec 2018 12:19:21 +0900, Michael Paquier wrote in <20181221031921.ge1...@paquier.xyz> > On Thu, Dec 20, 2018 at 01:57:30PM +0200, David Steele wrote: > > Good point. The new patch uses IsTLHistoryFileName(). > > OK, I have been reviewing the patch and the logic is correct, still I > could not resist reducing the number of inner if's for readability. I +1 basically. But I think that tail(name, 6) != ".ready" can happen with a certain frequency but strspn(name) < basenamelen rarely in the normal case. > also did not like the high-jacking of rlde->d_name so instead let's use > an intermediate variable to store the basename of a scanned entry. The > format of the if/elif with comments in-between was not really consistent > with the common practice as well. pg_indent has also been applied. > > > Ah, I see. Yes, that's exactly how I tested it, in addition to doing real > > promotions. > > OK, so am I doing. > > Attached is an updated patch. Does that look fine to you? The base > logic is unchanged, and after a promotion history files get archived > before the last partial segment. Renaming history to ishistory looks good. if (!found || (ishistory && !historyFound)) { /* init */ found = true; historyFound = ishistory; } else if (ishistory || (!ishstory && !historyFound)) /* compare/replace */ In the else if condition, ishisotry must be false in the right hand of ||. What we do here is ignoring non-history files once history file found. (Just a logic condensing and it would be done by compiler, though) "else if (!historyFound || ishistory)" > strcpy(xlog, newxlog); The caller prepares sufficient memory for basename, and we no longer copy ".ready" into newxlog. Douldn't we work directly on xlog instead of allocating newxlog? regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Change pgarch_readyXlog() to return .history files first
On Thu, Dec 20, 2018 at 01:57:30PM +0200, David Steele wrote: > Good point. The new patch uses IsTLHistoryFileName(). OK, I have been reviewing the patch and the logic is correct, still I could not resist reducing the number of inner if's for readability. I also did not like the high-jacking of rlde->d_name so instead let's use an intermediate variable to store the basename of a scanned entry. The format of the if/elif with comments in-between was not really consistent with the common practice as well. pg_indent has also been applied. > Ah, I see. Yes, that's exactly how I tested it, in addition to doing real > promotions. OK, so am I doing. Attached is an updated patch. Does that look fine to you? The base logic is unchanged, and after a promotion history files get archived before the last partial segment. -- Michael diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c index e88d545d65..bbd6311b35 100644 --- a/src/backend/postmaster/pgarch.c +++ b/src/backend/postmaster/pgarch.c @@ -695,11 +695,12 @@ pgarch_archiveXlog(char *xlog) * 2) because the oldest ones will sooner become candidates for * recycling at time of checkpoint * - * NOTE: the "oldest" comparison will presently consider all segments of - * a timeline with a smaller ID to be older than all segments of a timeline - * with a larger ID; the net result being that past timelines are given - * higher priority for archiving. This seems okay, or at least not - * obviously worth changing. + * NOTE: the "oldest" comparison will consider any .history file to be older + * than any other file except another .history file. Segments on a timeline + * with a smaller ID will be older than all segments on a timeline with a + * larger ID; the net result being that past timelines are given higher + * priority for archiving. This seems okay, or at least not obviously worth + * changing. */ static bool pgarch_readyXlog(char *xlog) @@ -715,6 +716,7 @@ pgarch_readyXlog(char *xlog) DIR *rldir; struct dirent *rlde; bool found = false; + bool historyFound = false; snprintf(XLogArchiveStatusDir, MAXPGPATH, XLOGDIR "/archive_status"); rldir = AllocateDir(XLogArchiveStatusDir); @@ -722,32 +724,54 @@ pgarch_readyXlog(char *xlog) while ((rlde = ReadDir(rldir, XLogArchiveStatusDir)) != NULL) { int basenamelen = (int) strlen(rlde->d_name) - 6; + char basename[MAX_XFN_CHARS + 1]; + bool ishistory; - if (basenamelen >= MIN_XFN_CHARS && - basenamelen <= MAX_XFN_CHARS && - strspn(rlde->d_name, VALID_XFN_CHARS) >= basenamelen && - strcmp(rlde->d_name + basenamelen, ".ready") == 0) + /* Ignore entries with unexpected number of characters */ + if (basenamelen < MIN_XFN_CHARS || + basenamelen > MAX_XFN_CHARS) + continue; + + /* Ignore entries with unexpected characters */ + if (strspn(rlde->d_name, VALID_XFN_CHARS) < basenamelen) + continue; + + /* Ignore anything not suffixed with .ready */ + if (strcmp(rlde->d_name + basenamelen, ".ready") != 0) + continue; + + /* Truncate off the .ready */ + memcpy(basename, rlde->d_name, basenamelen); + basename[basenamelen] = '\0'; + + /* Is this a history file? */ + ishistory = IsTLHistoryFileName(basename); + + /* + * Consume the file to archive. History files have the highest + * priority. If this is the first file or the first history file + * ever, copy it. In the presence of a history file already chosen as + * target, ignore all other files except history files which have been + * generated for an older timeline than what is already chosen as + * target to archive. + */ + if (!found || (ishistory && !historyFound)) { - if (!found) - { -strcpy(newxlog, rlde->d_name); -found = true; - } - else - { -if (strcmp(rlde->d_name, newxlog) < 0) - strcpy(newxlog, rlde->d_name); - } + strcpy(newxlog, basename); + found = true; + historyFound = ishistory; + } + else if (ishistory || (!ishistory && !historyFound)) + { + if (strcmp(basename, newxlog) < 0) +strcpy(newxlog, basename); } } FreeDir(rldir); if (found) - { - /* truncate off the .ready */ - newxlog[strlen(newxlog) - 6] = '\0'; strcpy(xlog, newxlog); - } + return found; } signature.asc Description: PGP signature
Re: Change pgarch_readyXlog() to return .history files first
On 12/15/18 2:10 AM, Michael Paquier wrote: On Fri, Dec 14, 2018 at 08:43:20AM -0500, David Steele wrote: On 12/13/18 7:15 PM, Michael Paquier wrote: Or you could just use IsTLHistoryFileName here? We'd have to truncate .ready off the string to make that work, which seems easy enough. Is that what you were thinking? Yes, that's the idea. pgarch_readyXlog returns the segment name which should be archived, so you could just compute it after detecting a .ready file. One thing to consider is the check above is more efficient than IsTLHistoryFileName() and it potentially gets run a lot. This check misses strspn(), so any file finishing with .history would get eaten even if that's unlikely to happen. Good point. The new patch uses IsTLHistoryFileName(). If one wants to simply check this code, you can just create dummy orphan files in archive_status and see in which order they get removed. Seems awfully racy. Are there currently any tests like this for the archiver that I can look at extending? Sorry for the confusion, I was referring to manual testing here. Ah, I see. Yes, that's exactly how I tested it, in addition to doing real promotions. Thinking about it, we could have an automatic test to check for the file order pattern by creating dummy files, starting the server with archiver enabled, and then parse the logs as orphan .ready files would get removed in the order their archiving is attempted with one WARNING entry generated for each. I am not sure if that is worth a test though. Yes, parsing the logs was the best thing I could think of, too. -- -David da...@pgmasters.net >From 1987a90b724506c7d9e453d9a976e3f982f625ec Mon Sep 17 00:00:00 2001 From: David Steele Date: Thu, 20 Dec 2018 13:28:51 +0200 Subject: [PATCH] Change pgarch_readyXlog() to return .history files first. The alphabetical ordering of pgarch_readyXlog() means that on promotion 000100010001.partial will be archived before 0002.history. This appears harmless, but the .history files are what other potential primaries use to decide what timeline they should pick. The additional latency of compressing/transferring the much larger partial file means that archiving of the .history file is delayed and greatly increases the chance that another primary will promote to the same timeline. Teach pgarch_readyXlog() to return .history files first (and in order) to reduce the window where this can happen. This won't prevent all conflicts, but it is a simple change and should greatly reduce real-world occurrences. --- src/backend/postmaster/pgarch.c | 36 +++-- 1 file changed, 25 insertions(+), 11 deletions(-) diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c index e88d545d65..50dc2027b1 100644 --- a/src/backend/postmaster/pgarch.c +++ b/src/backend/postmaster/pgarch.c @@ -695,11 +695,11 @@ pgarch_archiveXlog(char *xlog) * 2) because the oldest ones will sooner become candidates for * recycling at time of checkpoint * - * NOTE: the "oldest" comparison will presently consider all segments of - * a timeline with a smaller ID to be older than all segments of a timeline - * with a larger ID; the net result being that past timelines are given - * higher priority for archiving. This seems okay, or at least not - * obviously worth changing. + * NOTE: the "oldest" comparison will consider any .history file to be older + * than any other file except another .history file. Segments on a timeline + * with a smaller ID will be older than all segments on a timeline with a larger + * ID; the net result being that past timelines are given higher priority for + * archiving. This seems okay, or at least not obviously worth changing. */ static bool pgarch_readyXlog(char *xlog) @@ -715,6 +715,7 @@ pgarch_readyXlog(char *xlog) DIR*rldir; struct dirent *rlde; boolfound = false; + boolhistoryFound = false; snprintf(XLogArchiveStatusDir, MAXPGPATH, XLOGDIR "/archive_status"); rldir = AllocateDir(XLogArchiveStatusDir); @@ -728,12 +729,28 @@ pgarch_readyXlog(char *xlog) strspn(rlde->d_name, VALID_XFN_CHARS) >= basenamelen && strcmp(rlde->d_name + basenamelen, ".ready") == 0) { - if (!found) +/* Truncate off the .ready */ +rlde->d_name[basenamelen] = '\0'; + + /* Is this a history file? */ + bool history = IsTLHistoryFileName(rlde->d_name); + + /* +* If this is the first file or the first history file, copy it +*/ + if (!found || (history && !historyFound)) { s
Re: Change pgarch_readyXlog() to return .history files first
On Fri, Dec 14, 2018 at 08:43:20AM -0500, David Steele wrote: > On 12/13/18 7:15 PM, Michael Paquier wrote: >> I would like to hear opinion from other though if we should consider >> that as an improvement or an actual bug fix. Changing the order of the >> files to map with what the startup process does when promoting does not >> sound like a bug fix to me, still this is not really invasive, so we >> could really consider it worth back-patching to reduce common pain from >> users when it comes to timeline handling. > > I think an argument can be made that it is a bug (ish). Postgres > generates the files in one order, and they get archived in a different > order. I am not completely sure either. In my experience, if there is any doubt on such definitions the best answer is to not backpatch. >> Or you could just use IsTLHistoryFileName here? > > We'd have to truncate .ready off the string to make that work, which > seems easy enough. Is that what you were thinking? Yes, that's the idea. pgarch_readyXlog returns the segment name which should be archived, so you could just compute it after detecting a .ready file. > One thing to consider is the check above is more efficient than > IsTLHistoryFileName() and it potentially gets run a lot. This check misses strspn(), so any file finishing with .history would get eaten even if that's unlikely to happen. >> If one wants to simply check this code, you can just create dummy orphan >> files in archive_status and see in which order they get removed. > > Seems awfully racy. Are there currently any tests like this for the > archiver that I can look at extending? Sorry for the confusion, I was referring to manual testing here. Thinking about it, we could have an automatic test to check for the file order pattern by creating dummy files, starting the server with archiver enabled, and then parse the logs as orphan .ready files would get removed in the order their archiving is attempted with one WARNING entry generated for each. I am not sure if that is worth a test though. -- Michael signature.asc Description: PGP signature
Re: Change pgarch_readyXlog() to return .history files first
On 12/13/18 7:15 PM, Michael Paquier wrote: > On Thu, Dec 13, 2018 at 11:53:53AM -0500, David Steele wrote: >> I also think we should consider back-patching this change. It's hard to >> imagine that archive commands would have trouble with this reordering >> and the current ordering causes real pain in HA clusters. > > I would like to hear opinion from other though if we should consider > that as an improvement or an actual bug fix. Changing the order of the > files to map with what the startup process does when promoting does not > sound like a bug fix to me, still this is not really invasive, so we > could really consider it worth back-patching to reduce common pain from > users when it comes to timeline handling. I think an argument can be made that it is a bug (ish). Postgres generates the files in one order, and they get archived in a different order. >> -if (!found) >> +/* Is this a history file? */ >> +bool history = basenamelen >= sizeof(".history") && >> +strcmp(rlde->d_name + (basenamelen - sizeof(".history") + 1), >> + ".history.ready") == 0; > > Or you could just use IsTLHistoryFileName here? We'd have to truncate .ready off the string to make that work, which seems easy enough. Is that what you were thinking? One thing to consider is the check above is more efficient than IsTLHistoryFileName() and it potentially gets run a lot. > If one wants to simply check this code, you can just create dummy orphan > files in archive_status and see in which order they get removed. Seems awfully racy. Are there currently any tests like this for the archiver that I can look at extending? Regards, -- -David da...@pgmasters.net signature.asc Description: OpenPGP digital signature
Re: Change pgarch_readyXlog() to return .history files first
On Thu, Dec 13, 2018 at 11:53:53AM -0500, David Steele wrote: > I also think we should consider back-patching this change. It's hard to > imagine that archive commands would have trouble with this reordering > and the current ordering causes real pain in HA clusters. I would like to hear opinion from other though if we should consider that as an improvement or an actual bug fix. Changing the order of the files to map with what the startup process does when promoting does not sound like a bug fix to me, still this is not really invasive, so we could really consider it worth back-patching to reduce common pain from users when it comes to timeline handling. > -if (!found) > +/* Is this a history file? */ > +bool history = basenamelen >= sizeof(".history") && > +strcmp(rlde->d_name + (basenamelen - sizeof(".history") + 1), > + ".history.ready") == 0; Or you could just use IsTLHistoryFileName here? If one wants to simply check this code, you can just create dummy orphan files in archive_status and see in which order they get removed. -- Michael signature.asc Description: PGP signature
Re: Change pgarch_readyXlog() to return .history files first
On 12/13/18 11:53 AM, David Steele wrote: > Hackers, > > The alphabetical ordering of pgarch_readyXlog() means that on promotion > 000100010001.partial will be archived before 0002.history. > > This appears harmless, but the .history files are what other potential > primaries use to decide what timeline they should pick. The additional > latency of compressing/transferring the much larger partial file means > that archiving of the .history file is delayed and greatly increases the > chance that another primary will promote to the same timeline. > > Teach pgarch_readyXlog() to return .history files first (and in order) > to reduce the window where this can happen. This won't prevent all > conflicts, but it is a simple change and should greatly reduce > real-world occurrences. > > I also think we should consider back-patching this change. It's hard to > imagine that archive commands would have trouble with this reordering > and the current ordering causes real pain in HA clusters. Some gcc versions wanted more parens, so updated in attached. -- -David da...@pgmasters.net From d5fd60b4e329fe99b2f6565355014040dcdcee26 Mon Sep 17 00:00:00 2001 From: David Steele Date: Thu, 13 Dec 2018 12:58:27 -0500 Subject: [PATCH] Change pgarch_readyXlog() to return .history files first. The alphabetical ordering of pgarch_readyXlog() means that on promotion 000100010001.partial will be archived before 0002.history. This appears harmless, but the .history files are what other potential primaries use to decide what timeline they should pick. The additional latency of compressing/transferring the much larger partial file means that archiving of the .history file is delayed and greatly increases the chance that another primary will promote to the same timeline. Teach pgarch_readyXlog() to return .history files first (and in order) to reduce the window where this can happen. This won't prevent all conflicts, but it is a simple change and should greatly reduce real-world occurrences. --- src/backend/postmaster/pgarch.c | 30 +++--- 1 file changed, 23 insertions(+), 7 deletions(-) diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c index 0dcf609f19..95898080d7 100644 --- a/src/backend/postmaster/pgarch.c +++ b/src/backend/postmaster/pgarch.c @@ -704,11 +704,11 @@ pgarch_archiveXlog(char *xlog) * 2) because the oldest ones will sooner become candidates for * recycling at time of checkpoint * - * NOTE: the "oldest" comparison will presently consider all segments of - * a timeline with a smaller ID to be older than all segments of a timeline - * with a larger ID; the net result being that past timelines are given - * higher priority for archiving. This seems okay, or at least not - * obviously worth changing. + * NOTE: the "oldest" comparison will consider any .history file to be older + * than any other file except another .history file. Segments on a timeline + * with a smaller ID will be older than all segments on a timeline with a larger + * ID; the net result being that past timelines are given higher priority for + * archiving. This seems okay, or at least not obviously worth changing. */ static bool pgarch_readyXlog(char *xlog) @@ -724,6 +724,7 @@ pgarch_readyXlog(char *xlog) DIR*rldir; struct dirent *rlde; boolfound = false; + boolhistoryFound = false; snprintf(XLogArchiveStatusDir, MAXPGPATH, XLOGDIR "/archive_status"); rldir = AllocateDir(XLogArchiveStatusDir); @@ -737,12 +738,27 @@ pgarch_readyXlog(char *xlog) strspn(rlde->d_name, VALID_XFN_CHARS) >= basenamelen && strcmp(rlde->d_name + basenamelen, ".ready") == 0) { - if (!found) + /* Is this a history file? */ + bool history = basenamelen >= sizeof(".history") && + strcmp(rlde->d_name + (basenamelen - sizeof(".history") + 1), + ".history.ready") == 0; + + /* +* If this is the first file or the first history file, copy it +*/ + if (!found || (history && !historyFound)) { strcpy(newxlog, rlde->d_name); found = true; + historyFound = history; } - else + /* +* Else compare and see if this file is alpabetically less than +* what we already have. If so, c
Change pgarch_readyXlog() to return .history files first
Hackers, The alphabetical ordering of pgarch_readyXlog() means that on promotion 000100010001.partial will be archived before 0002.history. This appears harmless, but the .history files are what other potential primaries use to decide what timeline they should pick. The additional latency of compressing/transferring the much larger partial file means that archiving of the .history file is delayed and greatly increases the chance that another primary will promote to the same timeline. Teach pgarch_readyXlog() to return .history files first (and in order) to reduce the window where this can happen. This won't prevent all conflicts, but it is a simple change and should greatly reduce real-world occurrences. I also think we should consider back-patching this change. It's hard to imagine that archive commands would have trouble with this reordering and the current ordering causes real pain in HA clusters. Regards, -- -David da...@pgmasters.net From 2279697ce828e825066065e3e40c9926633523ad Mon Sep 17 00:00:00 2001 From: David Steele Date: Thu, 13 Dec 2018 11:00:37 -0500 Subject: [PATCH] Change pgarch_readyXlog() to return .history files first The alphabetical ordering of pgarch_readyXlog() means that on promotion 000100010001.partial will be archived before 0002.history. This appears harmless, but the .history files are what other potential primaries use to decide what timeline they should pick. The additional latency of compressing/transferring the much larger partial file means that archiving of the .history file is delayed and greatly increases the chance that another primary will promote to the same timeline. Teach pgarch_readyXlog() to return .history files first (and in order) to reduce the window where this can happen. This won't prevent all conflicts, but it is a simple change and should greatly reduce real-world occurrences. --- src/backend/postmaster/pgarch.c | 30 +++--- 1 file changed, 23 insertions(+), 7 deletions(-) diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c index 0dcf609f19..5dbab6a3d7 100644 --- a/src/backend/postmaster/pgarch.c +++ b/src/backend/postmaster/pgarch.c @@ -704,11 +704,11 @@ pgarch_archiveXlog(char *xlog) * 2) because the oldest ones will sooner become candidates for * recycling at time of checkpoint * - * NOTE: the "oldest" comparison will presently consider all segments of - * a timeline with a smaller ID to be older than all segments of a timeline - * with a larger ID; the net result being that past timelines are given - * higher priority for archiving. This seems okay, or at least not - * obviously worth changing. + * NOTE: the "oldest" comparison will consider any .history file to be older + * than any other file except another .history file. Segments on a timeline + * with a smaller ID will be older than all segments on a timeline with a larger + * ID; the net result being that past timelines are given higher priority for + * archiving. This seems okay, or at least not obviously worth changing. */ static bool pgarch_readyXlog(char *xlog) @@ -724,6 +724,7 @@ pgarch_readyXlog(char *xlog) DIR*rldir; struct dirent *rlde; boolfound = false; + boolhistoryFound = false; snprintf(XLogArchiveStatusDir, MAXPGPATH, XLOGDIR "/archive_status"); rldir = AllocateDir(XLogArchiveStatusDir); @@ -737,12 +738,27 @@ pgarch_readyXlog(char *xlog) strspn(rlde->d_name, VALID_XFN_CHARS) >= basenamelen && strcmp(rlde->d_name + basenamelen, ".ready") == 0) { - if (!found) + /* Is this a history file? */ + bool history = basenamelen >= sizeof(".history") && + strcmp(rlde->d_name + (basenamelen - sizeof(".history") + 1), + ".history.ready") == 0; + + /* +* If this is the first file or the first history file, copy it +*/ + if (!found || history && !historyFound) { strcpy(newxlog, rlde->d_name); found = true; + historyFound = history; } - else + /* +* Else compare and see if this file is alpabetically less than +* what we already have. If so, copy it. History files always get +* this comparison but other files only do when no history file has +* b