Re: Change pgarch_readyXlog() to return .history files first

2018-12-26 Thread David Steele

On 12/24/18 1:31 PM, Michael Paquier wrote:

On Sat, Dec 22, 2018 at 08:55:14AM +0900, Michael Paquier wrote:

Thanks for the lookups.  I can see that the patch applies without
conflicts down to 9.4, and based on the opinions gathered on this
thread back-patching this stuff is the consensus, based on input from
Kyotaro Horiguchi and David Steele (I don't mind much myself).  So,
any objections from others in doing so?


On REL9_4_STABLE, IsTLHistoryFileName() goes missing, so committed
"only" down to 9.5.  Thanks David for the patch and Horiguchi-san for
the review.


Thanks, Michael!

I'm not too worried about 9.4 since it is currently the oldest supported 
version.  HA users tend to be on the leading edge of the upgrade curve 
and others have the opportunity to upgrade if the reordering will help them.


--
-David
da...@pgmasters.net



Re: Change pgarch_readyXlog() to return .history files first

2018-12-24 Thread Michael Paquier
On Sat, Dec 22, 2018 at 08:55:14AM +0900, Michael Paquier wrote:
> Thanks for the lookups.  I can see that the patch applies without
> conflicts down to 9.4, and based on the opinions gathered on this
> thread back-patching this stuff is the consensus, based on input from
> Kyotaro Horiguchi and David Steele (I don't mind much myself).  So,
> any objections from others in doing so?

On REL9_4_STABLE, IsTLHistoryFileName() goes missing, so committed
"only" down to 9.5.  Thanks David for the patch and Horiguchi-san for
the review.
--
Michael


signature.asc
Description: PGP signature


Re: Change pgarch_readyXlog() to return .history files first

2018-12-21 Thread Michael Paquier
On Fri, Dec 21, 2018 at 08:17:12AM +0200, David Steele wrote:
> I thought about doing that, but wanted to focus on the task at hand.  It
> does save a strcpy and a bit of stack space, so seems like a win.
> 
> Overall, the patch looks good to me.  I think breaking up the if does make
> the code more readable.

Thanks for the lookups.  I can see that the patch applies without
conflicts down to 9.4, and based on the opinions gathered on this
thread back-patching this stuff is the consensus, based on input from
Kyotaro Horiguchi and David Steele (I don't mind much myself).  So,
any objections from others in doing so?
--
Michael


signature.asc
Description: PGP signature


Re: Change pgarch_readyXlog() to return .history files first

2018-12-20 Thread David Steele

On 12/21/18 6:49 AM, Kyotaro HORIGUCHI wrote:

"else if (!historyFound || ishistory)"




strcpy(xlog, newxlog);


The caller prepares sufficient memory for basename, and we no
longer copy ".ready" into newxlog. Douldn't we work directly on
xlog instead of allocating newxlog?


I thought about doing that, but wanted to focus on the task at hand.  It 
does save a strcpy and a bit of stack space, so seems like a win.


Overall, the patch looks good to me.  I think breaking up the if does 
make the code more readable.


Regards,
--
-David
da...@pgmasters.net



Re: Change pgarch_readyXlog() to return .history files first

2018-12-20 Thread Kyotaro HORIGUCHI
At Fri, 21 Dec 2018 14:17:25 +0900, Michael Paquier  wrote 
in <20181221051724.gg1...@paquier.xyz>
> On Fri, Dec 21, 2018 at 01:49:18PM +0900, Kyotaro HORIGUCHI wrote:
> > FWIW it seems to me a bug that making an inconsistent set of
> > files in archive directory.
> 
> Okay, point taken!  FWIW, I have no actual objections in not
> back-patching that.

I maybe(?) know.

> > At Fri, 21 Dec 2018 12:19:21 +0900, Michael Paquier  
> > wrote in <20181221031921.ge1...@paquier.xyz>
> >> On Thu, Dec 20, 2018 at 01:57:30PM +0200, David Steele wrote:
> >> > Good point.  The new patch uses IsTLHistoryFileName().
> >> 
> >> OK, I have been reviewing the patch and the logic is correct, still I
> >> could not resist reducing the number of inner if's for readability.  I
> > 
> > +1 basically. But I think that tail(name, 6) != ".ready" can
> > happen with a certain frequency but strspn(name) < basenamelen
> > rarely in the normal case.
> 
> So that +0.5 if "basically" means a partial agreement? :p

Mmm. No, +0.9.

> > In the else if condition, ishisotry must be false in the right
> > hand of ||. What we do here is ignoring non-history files once
> > history file found. (Just a logic condensing and it would be done
> > by compiler, though)
> 
> Yes, this can be simplified.  So let's do so.
> 
> > The caller prepares sufficient memory for basename, and we no
> > longer copy ".ready" into newxlog. Wouldn't we work directly on

Sorry for silly typo, but the 'W' was 'C', I meant:p

> > xlog instead of allocating newxlog?
> 
> Okay, let's simplify that as you suggest.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Change pgarch_readyXlog() to return .history files first

2018-12-20 Thread Michael Paquier
On Fri, Dec 21, 2018 at 01:49:18PM +0900, Kyotaro HORIGUCHI wrote:
> FWIW it seems to me a bug that making an inconsistent set of
> files in archive directory.

Okay, point taken!  FWIW, I have no actual objections in not
back-patching that.

> At Fri, 21 Dec 2018 12:19:21 +0900, Michael Paquier  
> wrote in <20181221031921.ge1...@paquier.xyz>
>> On Thu, Dec 20, 2018 at 01:57:30PM +0200, David Steele wrote:
>> > Good point.  The new patch uses IsTLHistoryFileName().
>> 
>> OK, I have been reviewing the patch and the logic is correct, still I
>> could not resist reducing the number of inner if's for readability.  I
> 
> +1 basically. But I think that tail(name, 6) != ".ready" can
> happen with a certain frequency but strspn(name) < basenamelen
> rarely in the normal case.

So that +0.5 if "basically" means a partial agreement? :p

> In the else if condition, ishisotry must be false in the right
> hand of ||. What we do here is ignoring non-history files once
> history file found. (Just a logic condensing and it would be done
> by compiler, though)

Yes, this can be simplified.  So let's do so.

> The caller prepares sufficient memory for basename, and we no
> longer copy ".ready" into newxlog. Wouldn't we work directly on
> xlog instead of allocating newxlog?

Okay, let's simplify that as you suggest.
--
Michael


signature.asc
Description: PGP signature


Re: Change pgarch_readyXlog() to return .history files first

2018-12-20 Thread Kyotaro HORIGUCHI
Hello.

FWIW it seems to me a bug that making an inconsistent set of
files in archive directory.

At Fri, 21 Dec 2018 12:19:21 +0900, Michael Paquier  wrote 
in <20181221031921.ge1...@paquier.xyz>
> On Thu, Dec 20, 2018 at 01:57:30PM +0200, David Steele wrote:
> > Good point.  The new patch uses IsTLHistoryFileName().
> 
> OK, I have been reviewing the patch and the logic is correct, still I
> could not resist reducing the number of inner if's for readability.  I

+1 basically. But I think that tail(name, 6) != ".ready" can
happen with a certain frequency but strspn(name) < basenamelen
rarely in the normal case.

> also did not like the high-jacking of rlde->d_name so instead let's use
> an intermediate variable to store the basename of a scanned entry.  The
> format of the if/elif with comments in-between was not really consistent
> with the common practice as well.  pg_indent has also been applied.
> 
> > Ah, I see.  Yes, that's exactly how I tested it, in addition to doing real
> > promotions.
> 
> OK, so am I doing.
> 
> Attached is an updated patch.  Does that look fine to you?  The base
> logic is unchanged, and after a promotion history files get archived
> before the last partial segment.

Renaming history to ishistory looks good.

if (!found || (ishistory && !historyFound))
{
  /* init */
  found = true;
  historyFound = ishistory;
}
else if (ishistory || (!ishstory && !historyFound))
 /* compare/replace */

In the else if condition, ishisotry must be false in the right
hand of ||. What we do here is ignoring non-history files once
history file found. (Just a logic condensing and it would be done
by compiler, though)

"else if (!historyFound || ishistory)"



>   strcpy(xlog, newxlog);

The caller prepares sufficient memory for basename, and we no
longer copy ".ready" into newxlog. Douldn't we work directly on
xlog instead of allocating newxlog?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Change pgarch_readyXlog() to return .history files first

2018-12-20 Thread Michael Paquier
On Thu, Dec 20, 2018 at 01:57:30PM +0200, David Steele wrote:
> Good point.  The new patch uses IsTLHistoryFileName().

OK, I have been reviewing the patch and the logic is correct, still I
could not resist reducing the number of inner if's for readability.  I
also did not like the high-jacking of rlde->d_name so instead let's use
an intermediate variable to store the basename of a scanned entry.  The
format of the if/elif with comments in-between was not really consistent
with the common practice as well.  pg_indent has also been applied.

> Ah, I see.  Yes, that's exactly how I tested it, in addition to doing real
> promotions.

OK, so am I doing.

Attached is an updated patch.  Does that look fine to you?  The base
logic is unchanged, and after a promotion history files get archived
before the last partial segment.
--
Michael
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index e88d545d65..bbd6311b35 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -695,11 +695,12 @@ pgarch_archiveXlog(char *xlog)
  * 2) because the oldest ones will sooner become candidates for
  * recycling at time of checkpoint
  *
- * NOTE: the "oldest" comparison will presently consider all segments of
- * a timeline with a smaller ID to be older than all segments of a timeline
- * with a larger ID; the net result being that past timelines are given
- * higher priority for archiving.  This seems okay, or at least not
- * obviously worth changing.
+ * NOTE: the "oldest" comparison will consider any .history file to be older
+ * than any other file except another .history file.  Segments on a timeline
+ * with a smaller ID will be older than all segments on a timeline with a
+ * larger ID; the net result being that past timelines are given higher
+ * priority for archiving.  This seems okay, or at least not obviously worth
+ * changing.
  */
 static bool
 pgarch_readyXlog(char *xlog)
@@ -715,6 +716,7 @@ pgarch_readyXlog(char *xlog)
 	DIR		   *rldir;
 	struct dirent *rlde;
 	bool		found = false;
+	bool		historyFound = false;
 
 	snprintf(XLogArchiveStatusDir, MAXPGPATH, XLOGDIR "/archive_status");
 	rldir = AllocateDir(XLogArchiveStatusDir);
@@ -722,32 +724,54 @@ pgarch_readyXlog(char *xlog)
 	while ((rlde = ReadDir(rldir, XLogArchiveStatusDir)) != NULL)
 	{
 		int			basenamelen = (int) strlen(rlde->d_name) - 6;
+		char		basename[MAX_XFN_CHARS + 1];
+		bool		ishistory;
 
-		if (basenamelen >= MIN_XFN_CHARS &&
-			basenamelen <= MAX_XFN_CHARS &&
-			strspn(rlde->d_name, VALID_XFN_CHARS) >= basenamelen &&
-			strcmp(rlde->d_name + basenamelen, ".ready") == 0)
+		/* Ignore entries with unexpected number of characters */
+		if (basenamelen < MIN_XFN_CHARS ||
+			basenamelen > MAX_XFN_CHARS)
+			continue;
+
+		/* Ignore entries with unexpected characters */
+		if (strspn(rlde->d_name, VALID_XFN_CHARS) < basenamelen)
+			continue;
+
+		/* Ignore anything not suffixed with .ready */
+		if (strcmp(rlde->d_name + basenamelen, ".ready") != 0)
+			continue;
+
+		/* Truncate off the .ready */
+		memcpy(basename, rlde->d_name, basenamelen);
+		basename[basenamelen] = '\0';
+
+		/* Is this a history file? */
+		ishistory = IsTLHistoryFileName(basename);
+
+		/*
+		 * Consume the file to archive.  History files have the highest
+		 * priority.  If this is the first file or the first history file
+		 * ever, copy it.  In the presence of a history file already chosen as
+		 * target, ignore all other files except history files which have been
+		 * generated for an older timeline than what is already chosen as
+		 * target to archive.
+		 */
+		if (!found || (ishistory && !historyFound))
 		{
-			if (!found)
-			{
-strcpy(newxlog, rlde->d_name);
-found = true;
-			}
-			else
-			{
-if (strcmp(rlde->d_name, newxlog) < 0)
-	strcpy(newxlog, rlde->d_name);
-			}
+			strcpy(newxlog, basename);
+			found = true;
+			historyFound = ishistory;
+		}
+		else if (ishistory || (!ishistory && !historyFound))
+		{
+			if (strcmp(basename, newxlog) < 0)
+strcpy(newxlog, basename);
 		}
 	}
 	FreeDir(rldir);
 
 	if (found)
-	{
-		/* truncate off the .ready */
-		newxlog[strlen(newxlog) - 6] = '\0';
 		strcpy(xlog, newxlog);
-	}
+
 	return found;
 }
 


signature.asc
Description: PGP signature


Re: Change pgarch_readyXlog() to return .history files first

2018-12-20 Thread David Steele

On 12/15/18 2:10 AM, Michael Paquier wrote:

On Fri, Dec 14, 2018 at 08:43:20AM -0500, David Steele wrote:

On 12/13/18 7:15 PM, Michael Paquier wrote:



Or you could just use IsTLHistoryFileName here?


We'd have to truncate .ready off the string to make that work, which
seems easy enough.  Is that what you were thinking?


Yes, that's the idea.  pgarch_readyXlog returns the segment name which
should be archived, so you could just compute it after detecting a
.ready file.


One thing to consider is the check above is more efficient than
IsTLHistoryFileName() and it potentially gets run a lot.


This check misses strspn(), so any file finishing with .history would
get eaten even if that's unlikely to happen.


Good point.  The new patch uses IsTLHistoryFileName().


If one wants to simply check this code, you can just create dummy orphan
files in archive_status and see in which order they get removed.


Seems awfully racy.  Are there currently any tests like this for the
archiver that I can look at extending?


Sorry for the confusion, I was referring to manual testing here.


Ah, I see.  Yes, that's exactly how I tested it, in addition to doing 
real promotions.



Thinking about it, we could have an automatic test to check for the file
order pattern by creating dummy files, starting the server with archiver
enabled, and then parse the logs as orphan .ready files would get
removed in the order their archiving is attempted with one WARNING entry
generated for each.  I am not sure if that is worth a test though.


Yes, parsing the logs was the best thing I could think of, too.

--
-David
da...@pgmasters.net
>From 1987a90b724506c7d9e453d9a976e3f982f625ec Mon Sep 17 00:00:00 2001
From: David Steele 
Date: Thu, 20 Dec 2018 13:28:51 +0200
Subject: [PATCH] Change pgarch_readyXlog() to return .history files first.

The alphabetical ordering of pgarch_readyXlog() means that on promotion
000100010001.partial will be archived before 0002.history.

This appears harmless, but the .history files are what other potential primaries
use to decide what timeline they should pick.  The additional latency of
compressing/transferring the much larger partial file means that archiving of
the .history file is delayed and greatly increases the chance that another
primary will promote to the same timeline.

Teach pgarch_readyXlog() to return .history files first (and in order) to reduce
the window where this can happen.  This won't prevent all conflicts, but it is a
simple change and should greatly reduce real-world occurrences.
---
 src/backend/postmaster/pgarch.c | 36 +++--
 1 file changed, 25 insertions(+), 11 deletions(-)

diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index e88d545d65..50dc2027b1 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -695,11 +695,11 @@ pgarch_archiveXlog(char *xlog)
  * 2) because the oldest ones will sooner become candidates for
  * recycling at time of checkpoint
  *
- * NOTE: the "oldest" comparison will presently consider all segments of
- * a timeline with a smaller ID to be older than all segments of a timeline
- * with a larger ID; the net result being that past timelines are given
- * higher priority for archiving.  This seems okay, or at least not
- * obviously worth changing.
+ * NOTE: the "oldest" comparison will consider any .history file to be older
+ * than any other file except another .history file.  Segments on a timeline
+ * with a smaller ID will be older than all segments on a timeline with a 
larger
+ * ID; the net result being that past timelines are given higher priority for
+ * archiving.  This seems okay, or at least not obviously worth changing.
  */
 static bool
 pgarch_readyXlog(char *xlog)
@@ -715,6 +715,7 @@ pgarch_readyXlog(char *xlog)
DIR*rldir;
struct dirent *rlde;
boolfound = false;
+   boolhistoryFound = false;
 
snprintf(XLogArchiveStatusDir, MAXPGPATH, XLOGDIR "/archive_status");
rldir = AllocateDir(XLogArchiveStatusDir);
@@ -728,12 +729,28 @@ pgarch_readyXlog(char *xlog)
strspn(rlde->d_name, VALID_XFN_CHARS) >= basenamelen &&
strcmp(rlde->d_name + basenamelen, ".ready") == 0)
{
-   if (!found)
+/* Truncate off the .ready */
+rlde->d_name[basenamelen] = '\0';
+
+   /* Is this a history file? */
+   bool history = IsTLHistoryFileName(rlde->d_name);
+
+   /*
+* If this is the first file or the first history file, 
copy it
+*/
+   if (!found || (history && !historyFound))
{
s

Re: Change pgarch_readyXlog() to return .history files first

2018-12-14 Thread Michael Paquier
On Fri, Dec 14, 2018 at 08:43:20AM -0500, David Steele wrote:
> On 12/13/18 7:15 PM, Michael Paquier wrote:
>> I would like to hear opinion from other though if we should consider
>> that as an improvement or an actual bug fix.  Changing the order of the
>> files to map with what the startup process does when promoting does not
>> sound like a bug fix to me, still this is not really invasive, so we
>> could really consider it worth back-patching to reduce common pain from
>> users when it comes to timeline handling.
> 
> I think an argument can be made that it is a bug (ish).  Postgres
> generates the files in one order, and they get archived in a different
> order.

I am not completely sure either.  In my experience, if there is any
doubt on such definitions the best answer is to not backpatch.

>> Or you could just use IsTLHistoryFileName here?
> 
> We'd have to truncate .ready off the string to make that work, which
> seems easy enough.  Is that what you were thinking?

Yes, that's the idea.  pgarch_readyXlog returns the segment name which
should be archived, so you could just compute it after detecting a
.ready file.

> One thing to consider is the check above is more efficient than
> IsTLHistoryFileName() and it potentially gets run a lot.

This check misses strspn(), so any file finishing with .history would
get eaten even if that's unlikely to happen.

>> If one wants to simply check this code, you can just create dummy orphan
>> files in archive_status and see in which order they get removed.
> 
> Seems awfully racy.  Are there currently any tests like this for the
> archiver that I can look at extending?

Sorry for the confusion, I was referring to manual testing here.
Thinking about it, we could have an automatic test to check for the file
order pattern by creating dummy files, starting the server with archiver
enabled, and then parse the logs as orphan .ready files would get
removed in the order their archiving is attempted with one WARNING entry
generated for each.  I am not sure if that is worth a test though.
--
Michael


signature.asc
Description: PGP signature


Re: Change pgarch_readyXlog() to return .history files first

2018-12-14 Thread David Steele
On 12/13/18 7:15 PM, Michael Paquier wrote:
> On Thu, Dec 13, 2018 at 11:53:53AM -0500, David Steele wrote:
>> I also think we should consider back-patching this change.  It's hard to
>> imagine that archive commands would have trouble with this reordering
>> and the current ordering causes real pain in HA clusters.
> 
> I would like to hear opinion from other though if we should consider
> that as an improvement or an actual bug fix.  Changing the order of the
> files to map with what the startup process does when promoting does not
> sound like a bug fix to me, still this is not really invasive, so we
> could really consider it worth back-patching to reduce common pain from
> users when it comes to timeline handling.

I think an argument can be made that it is a bug (ish).  Postgres
generates the files in one order, and they get archived in a different
order.

>> -if (!found)
>> +/* Is this a history file? */
>> +bool history = basenamelen >= sizeof(".history") &&
>> +strcmp(rlde->d_name + (basenamelen - sizeof(".history") + 1),
>> +   ".history.ready") == 0;
> 
> Or you could just use IsTLHistoryFileName here?

We'd have to truncate .ready off the string to make that work, which
seems easy enough.  Is that what you were thinking?

One thing to consider is the check above is more efficient than
IsTLHistoryFileName() and it potentially gets run a lot.

> If one wants to simply check this code, you can just create dummy orphan
> files in archive_status and see in which order they get removed.

Seems awfully racy.  Are there currently any tests like this for the
archiver that I can look at extending?

Regards,
-- 
-David
da...@pgmasters.net



signature.asc
Description: OpenPGP digital signature


Re: Change pgarch_readyXlog() to return .history files first

2018-12-13 Thread Michael Paquier
On Thu, Dec 13, 2018 at 11:53:53AM -0500, David Steele wrote:
> I also think we should consider back-patching this change.  It's hard to
> imagine that archive commands would have trouble with this reordering
> and the current ordering causes real pain in HA clusters.

I would like to hear opinion from other though if we should consider
that as an improvement or an actual bug fix.  Changing the order of the
files to map with what the startup process does when promoting does not
sound like a bug fix to me, still this is not really invasive, so we
could really consider it worth back-patching to reduce common pain from
users when it comes to timeline handling.

> -if (!found)
> +/* Is this a history file? */
> +bool history = basenamelen >= sizeof(".history") &&
> +strcmp(rlde->d_name + (basenamelen - sizeof(".history") + 1),
> +   ".history.ready") == 0;

Or you could just use IsTLHistoryFileName here?

If one wants to simply check this code, you can just create dummy orphan
files in archive_status and see in which order they get removed.
--
Michael


signature.asc
Description: PGP signature


Re: Change pgarch_readyXlog() to return .history files first

2018-12-13 Thread David Steele
On 12/13/18 11:53 AM, David Steele wrote:
> Hackers,
> 
> The alphabetical ordering of pgarch_readyXlog() means that on promotion
> 000100010001.partial will be archived before 0002.history.
> 
> This appears harmless, but the .history files are what other potential
> primaries use to decide what timeline they should pick.  The additional
> latency of compressing/transferring the much larger partial file means
> that archiving of the .history file is delayed and greatly increases the
> chance that another primary will promote to the same timeline.
> 
> Teach pgarch_readyXlog() to return .history files first (and in order)
> to reduce the window where this can happen.  This won't prevent all
> conflicts, but it is a simple change and should greatly reduce
> real-world occurrences.
> 
> I also think we should consider back-patching this change.  It's hard to
> imagine that archive commands would have trouble with this reordering
> and the current ordering causes real pain in HA clusters.

Some gcc versions wanted more parens, so updated in attached.

-- 
-David
da...@pgmasters.net
From d5fd60b4e329fe99b2f6565355014040dcdcee26 Mon Sep 17 00:00:00 2001
From: David Steele 
Date: Thu, 13 Dec 2018 12:58:27 -0500
Subject: [PATCH] Change pgarch_readyXlog() to return .history files first.

The alphabetical ordering of pgarch_readyXlog() means that on promotion
000100010001.partial will be archived before 0002.history.

This appears harmless, but the .history files are what other potential primaries
use to decide what timeline they should pick.  The additional latency of
compressing/transferring the much larger partial file means that archiving of
the .history file is delayed and greatly increases the chance that another
primary will promote to the same timeline.

Teach pgarch_readyXlog() to return .history files first (and in order) to reduce
the window where this can happen.  This won't prevent all conflicts, but it is a
simple change and should greatly reduce real-world occurrences.
---
 src/backend/postmaster/pgarch.c | 30 +++---
 1 file changed, 23 insertions(+), 7 deletions(-)

diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 0dcf609f19..95898080d7 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -704,11 +704,11 @@ pgarch_archiveXlog(char *xlog)
  * 2) because the oldest ones will sooner become candidates for
  * recycling at time of checkpoint
  *
- * NOTE: the "oldest" comparison will presently consider all segments of
- * a timeline with a smaller ID to be older than all segments of a timeline
- * with a larger ID; the net result being that past timelines are given
- * higher priority for archiving.  This seems okay, or at least not
- * obviously worth changing.
+ * NOTE: the "oldest" comparison will consider any .history file to be older
+ * than any other file except another .history file.  Segments on a timeline
+ * with a smaller ID will be older than all segments on a timeline with a 
larger
+ * ID; the net result being that past timelines are given higher priority for
+ * archiving.  This seems okay, or at least not obviously worth changing.
  */
 static bool
 pgarch_readyXlog(char *xlog)
@@ -724,6 +724,7 @@ pgarch_readyXlog(char *xlog)
DIR*rldir;
struct dirent *rlde;
boolfound = false;
+   boolhistoryFound = false;
 
snprintf(XLogArchiveStatusDir, MAXPGPATH, XLOGDIR "/archive_status");
rldir = AllocateDir(XLogArchiveStatusDir);
@@ -737,12 +738,27 @@ pgarch_readyXlog(char *xlog)
strspn(rlde->d_name, VALID_XFN_CHARS) >= basenamelen &&
strcmp(rlde->d_name + basenamelen, ".ready") == 0)
{
-   if (!found)
+   /* Is this a history file? */
+   bool history = basenamelen >= sizeof(".history") &&
+   strcmp(rlde->d_name + (basenamelen - 
sizeof(".history") + 1),
+  ".history.ready") == 0;
+
+   /*
+* If this is the first file or the first history file, 
copy it
+*/
+   if (!found || (history && !historyFound))
{
strcpy(newxlog, rlde->d_name);
found = true;
+   historyFound = history;
}
-   else
+   /*
+* Else compare and see if this file is alpabetically 
less than
+* what we already have.  If so, c

Change pgarch_readyXlog() to return .history files first

2018-12-13 Thread David Steele
Hackers,

The alphabetical ordering of pgarch_readyXlog() means that on promotion
000100010001.partial will be archived before 0002.history.

This appears harmless, but the .history files are what other potential
primaries use to decide what timeline they should pick.  The additional
latency of compressing/transferring the much larger partial file means
that archiving of the .history file is delayed and greatly increases the
chance that another primary will promote to the same timeline.

Teach pgarch_readyXlog() to return .history files first (and in order)
to reduce the window where this can happen.  This won't prevent all
conflicts, but it is a simple change and should greatly reduce
real-world occurrences.

I also think we should consider back-patching this change.  It's hard to
imagine that archive commands would have trouble with this reordering
and the current ordering causes real pain in HA clusters.

Regards,
-- 
-David
da...@pgmasters.net
From 2279697ce828e825066065e3e40c9926633523ad Mon Sep 17 00:00:00 2001
From: David Steele 
Date: Thu, 13 Dec 2018 11:00:37 -0500
Subject: [PATCH] Change pgarch_readyXlog() to return .history files first

The alphabetical ordering of pgarch_readyXlog() means that on promotion 
000100010001.partial will be archived before 0002.history.

This appears harmless, but the .history files are what other potential 
primaries 
use to decide what timeline they should pick.  The additional latency of 
compressing/transferring the much larger partial file means that archiving of 
the .history file is delayed and greatly increases the chance that another 
primary will promote to the same timeline.

Teach pgarch_readyXlog() to return .history files first (and in order) to 
reduce 
the window where this can happen.  This won't prevent all conflicts, but it is 
a 
simple change and should greatly reduce real-world occurrences.
---
 src/backend/postmaster/pgarch.c | 30 +++---
 1 file changed, 23 insertions(+), 7 deletions(-)

diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 0dcf609f19..5dbab6a3d7 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -704,11 +704,11 @@ pgarch_archiveXlog(char *xlog)
  * 2) because the oldest ones will sooner become candidates for
  * recycling at time of checkpoint
  *
- * NOTE: the "oldest" comparison will presently consider all segments of
- * a timeline with a smaller ID to be older than all segments of a timeline
- * with a larger ID; the net result being that past timelines are given
- * higher priority for archiving.  This seems okay, or at least not
- * obviously worth changing.
+ * NOTE: the "oldest" comparison will consider any .history file to be older
+ * than any other file except another .history file.  Segments on a timeline
+ * with a smaller ID will be older than all segments on a timeline with a 
larger
+ * ID; the net result being that past timelines are given higher priority for
+ * archiving.  This seems okay, or at least not obviously worth changing.
  */
 static bool
 pgarch_readyXlog(char *xlog)
@@ -724,6 +724,7 @@ pgarch_readyXlog(char *xlog)
DIR*rldir;
struct dirent *rlde;
boolfound = false;
+   boolhistoryFound = false;
 
snprintf(XLogArchiveStatusDir, MAXPGPATH, XLOGDIR "/archive_status");
rldir = AllocateDir(XLogArchiveStatusDir);
@@ -737,12 +738,27 @@ pgarch_readyXlog(char *xlog)
strspn(rlde->d_name, VALID_XFN_CHARS) >= basenamelen &&
strcmp(rlde->d_name + basenamelen, ".ready") == 0)
{
-   if (!found)
+   /* Is this a history file? */
+   bool history = basenamelen >= sizeof(".history") &&
+   strcmp(rlde->d_name + (basenamelen - 
sizeof(".history") + 1),
+  ".history.ready") == 0;
+
+   /*
+* If this is the first file or the first history file, 
copy it
+*/
+   if (!found || history && !historyFound)
{
strcpy(newxlog, rlde->d_name);
found = true;
+   historyFound = history;
}
-   else
+   /*
+* Else compare and see if this file is alpabetically 
less than
+* what we already have.  If so, copy it.  History 
files always get
+* this comparison but other files only do when no 
history file has
+* b