Hackers,

The alphabetical ordering of pgarch_readyXlog() means that on promotion
000000010000000100000001.partial will be archived before 00000002.history.

This appears harmless, but the .history files are what other potential
primaries use to decide what timeline they should pick.  The additional
latency of compressing/transferring the much larger partial file means
that archiving of the .history file is delayed and greatly increases the
chance that another primary will promote to the same timeline.

Teach pgarch_readyXlog() to return .history files first (and in order)
to reduce the window where this can happen.  This won't prevent all
conflicts, but it is a simple change and should greatly reduce
real-world occurrences.

I also think we should consider back-patching this change.  It's hard to
imagine that archive commands would have trouble with this reordering
and the current ordering causes real pain in HA clusters.

Regards,
-- 
-David
da...@pgmasters.net
From 2279697ce828e825066065e3e40c9926633523ad Mon Sep 17 00:00:00 2001
From: David Steele <da...@pgmasters.net>
Date: Thu, 13 Dec 2018 11:00:37 -0500
Subject: [PATCH] Change pgarch_readyXlog() to return .history files first

The alphabetical ordering of pgarch_readyXlog() means that on promotion 
000000010000000100000001.partial will be archived before 00000002.history.

This appears harmless, but the .history files are what other potential 
primaries 
use to decide what timeline they should pick.  The additional latency of 
compressing/transferring the much larger partial file means that archiving of 
the .history file is delayed and greatly increases the chance that another 
primary will promote to the same timeline.

Teach pgarch_readyXlog() to return .history files first (and in order) to 
reduce 
the window where this can happen.  This won't prevent all conflicts, but it is 
a 
simple change and should greatly reduce real-world occurrences.
---
 src/backend/postmaster/pgarch.c | 30 +++++++++++++++++++++++-------
 1 file changed, 23 insertions(+), 7 deletions(-)

diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 0dcf609f19..5dbab6a3d7 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -704,11 +704,11 @@ pgarch_archiveXlog(char *xlog)
  * 2) because the oldest ones will sooner become candidates for
  * recycling at time of checkpoint
  *
- * NOTE: the "oldest" comparison will presently consider all segments of
- * a timeline with a smaller ID to be older than all segments of a timeline
- * with a larger ID; the net result being that past timelines are given
- * higher priority for archiving.  This seems okay, or at least not
- * obviously worth changing.
+ * NOTE: the "oldest" comparison will consider any .history file to be older
+ * than any other file except another .history file.  Segments on a timeline
+ * with a smaller ID will be older than all segments on a timeline with a 
larger
+ * ID; the net result being that past timelines are given higher priority for
+ * archiving.  This seems okay, or at least not obviously worth changing.
  */
 static bool
 pgarch_readyXlog(char *xlog)
@@ -724,6 +724,7 @@ pgarch_readyXlog(char *xlog)
        DIR                *rldir;
        struct dirent *rlde;
        bool            found = false;
+       bool            historyFound = false;
 
        snprintf(XLogArchiveStatusDir, MAXPGPATH, XLOGDIR "/archive_status");
        rldir = AllocateDir(XLogArchiveStatusDir);
@@ -737,12 +738,27 @@ pgarch_readyXlog(char *xlog)
                        strspn(rlde->d_name, VALID_XFN_CHARS) >= basenamelen &&
                        strcmp(rlde->d_name + basenamelen, ".ready") == 0)
                {
-                       if (!found)
+                       /* Is this a history file? */
+                       bool history = basenamelen >= sizeof(".history") &&
+                               strcmp(rlde->d_name + (basenamelen - 
sizeof(".history") + 1),
+                                          ".history.ready") == 0;
+
+                       /*
+                        * If this is the first file or the first history file, 
copy it
+                        */
+                       if (!found || history && !historyFound)
                        {
                                strcpy(newxlog, rlde->d_name);
                                found = true;
+                               historyFound = history;
                        }
-                       else
+                       /*
+                        * Else compare and see if this file is alpabetically 
less than
+                        * what we already have.  If so, copy it.  History 
files always get
+                        * this comparison but other files only do when no 
history file has
+                        * been found yet.
+                        */
+                       else if (history || !history && !historyFound)
                        {
                                if (strcmp(rlde->d_name, newxlog) < 0)
                                        strcpy(newxlog, rlde->d_name);
-- 
2.17.2 (Apple Git-113)

Reply via email to