Re: [HACKERS] Replication slot xmin is not reset if HS feedback is turned off while standby is shut down

2017-01-26 Thread Simon Riggs
On 13 January 2017 at 10:17, Ants Aasma  wrote:
> On 5 Jan 2017 2:54 a.m., "Craig Ringer"  wrote:
>> Ants, do you think you'll have a chance to convert your shell script
>> test into a TAP test in src/test/recovery?
>>
>> Simon has said he would like to commit this fix. I'd personally be
>> happier if it had a test to go with it.
>>
>> You could probably just add to src/test/recover/t/001 which now
>> contains my additions for hot standby.
>
> Do you feel the test in the attached patch is enough or would you like
> to see anything else covered?

That looks good, thanks for the patch.

Applying in next few hours, barring objections; then backpatching code
(without tests).

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication slot xmin is not reset if HS feedback is turned off while standby is shut down

2017-01-13 Thread Ants Aasma
On 5 Jan 2017 2:54 a.m., "Craig Ringer"  wrote:
> Ants, do you think you'll have a chance to convert your shell script
> test into a TAP test in src/test/recovery?
>
> Simon has said he would like to commit this fix. I'd personally be
> happier if it had a test to go with it.
>
> You could probably just add to src/test/recover/t/001 which now
> contains my additions for hot standby.

Do you feel the test in the attached patch is enough or would you like
to see anything else covered?

Regards,
Ants Aasma
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index c6b54ec..0ab936f 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -1157,7 +1157,10 @@ XLogWalRcvSendReply(bool force, bool requestReply)
  * in case they don't have a watch.
  *
  * If the user disables feedback, send one final message to tell sender
- * to forget about the xmin on this standby.
+ * to forget about the xmin on this standby. We also send this message
+ * on first connect because a previous connection might have set xmin
+ * on a replication slot. (If we're not using a slot it's harmless to
+ * send a feedback message explicitly setting InvalidTransactionId).
  */
 static void
 XLogWalRcvSendHSFeedback(bool immed)
@@ -1167,7 +1170,8 @@ XLogWalRcvSendHSFeedback(bool immed)
 	uint32		nextEpoch;
 	TransactionId xmin;
 	static TimestampTz sendTime = 0;
-	static bool master_has_standby_xmin = false;
+	/* initially true so we always send at least one feedback message */
+	static bool master_has_standby_xmin = true;
 
 	/*
 	 * If the user doesn't want status to be reported to the master, be sure
@@ -1192,14 +1196,17 @@ XLogWalRcvSendHSFeedback(bool immed)
 	}
 
 	/*
-	 * If Hot Standby is not yet active there is nothing to send. Check this
-	 * after the interval has expired to reduce number of calls.
+	 * If Hot Standby is not yet accepting connections there is nothing to
+	 * send. Check this after the interval has expired to reduce number of
+	 * calls.
+	 *
+	 * Bailing out here also ensures that we don't send feedback until we've
+	 * read our own replication slot state, so we don't tell the master to
+	 * discard needed xmin or catalog_xmin from any slots that may exist
+	 * on this replica.
 	 */
 	if (!HotStandbyActive())
-	{
-		Assert(!master_has_standby_xmin);
 		return;
-	}
 
 	/*
 	 * Make the expensive call to get the oldest xmin once we are certain
diff --git a/src/test/recovery/t/001_stream_rep.pl b/src/test/recovery/t/001_stream_rep.pl
index eef512d..7fb2e9e 100644
--- a/src/test/recovery/t/001_stream_rep.pl
+++ b/src/test/recovery/t/001_stream_rep.pl
@@ -3,7 +3,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 22;
+use Test::More tests => 24;
 
 # Initialize master node
 my $node_master = get_new_node('master');
@@ -161,3 +161,22 @@ is($catalog_xmin, '', 'non-cascaded slot xmin still null with hs_feedback reset'
 ($xmin, $catalog_xmin) = get_slot_xmins($node_standby_1, $slotname_2);
 is($xmin, '', 'cascaded slot xmin null with hs feedback reset');
 is($catalog_xmin, '', 'cascaded slot xmin still null with hs_feedback reset');
+
+diag "re-enabling hot_standby_feedback and disabling while stopped";
+$node_standby_2->safe_psql('postgres', 'ALTER SYSTEM SET hot_standby_feedback = on;');
+$node_standby_2->reload;
+
+$node_master->safe_psql('postgres', qq[INSERT INTO tab_int VALUES (11000);]);
+replay_check();
+
+$node_standby_2->safe_psql('postgres', 'ALTER SYSTEM SET hot_standby_feedback = off;');
+$node_standby_2->stop;
+
+($xmin, $catalog_xmin) = get_slot_xmins($node_standby_1, $slotname_2);
+isnt($xmin, '', 'cascaded slot xmin non-null with postgres shut down');
+
+# Xmin from a previous run should be cleared on startup.
+$node_standby_2->start;
+
+($xmin, $catalog_xmin) = get_slot_xmins($node_standby_1, $slotname_2);
+is($xmin, '', 'cascaded slot xmin reset after startup with hs feedback reset');

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication slot xmin is not reset if HS feedback is turned off while standby is shut down

2017-01-05 Thread Ants Aasma
On 5 Jan 2017 2:54 a.m., "Craig Ringer"  wrote:

On 2 January 2017 at 22:24, Craig Ringer  wrote:
>
>
> On 2 Jan. 2017 20:20, "Simon Riggs"  wrote:
>
> On 21 December 2016 at 13:23, Simon Riggs  wrote:
>
>> Fix it up and I'll commit. Thanks for the report.
>
> I was hoping for some more effort from Ants to correct this.
>
> I'll commit Craig's new tests for hs feedback before this, so it can
> go in with a Tap test, so I imagine we're about a week away from
> commit on this.
>
>
> I posted a revised version of Ants's patch that passes the shell script
> test.
>
> A TAP  test would likely make sene though, I agree.


Ants, do you think you'll have a chance to convert your shell script
test into a TAP test in src/test/recovery?

Simon has said he would like to commit this fix. I'd personally be
happier if it had a test to go with it.

You could probably just add to src/test/recover/t/001 which now
contains my additions for hot standby.


I'm travelling right now, but I should be able to give it a shot next week.

Regards,
Ants Aasma


Re: [HACKERS] Replication slot xmin is not reset if HS feedback is turned off while standby is shut down

2017-01-04 Thread Craig Ringer
On 2 January 2017 at 22:24, Craig Ringer  wrote:
>
>
> On 2 Jan. 2017 20:20, "Simon Riggs"  wrote:
>
> On 21 December 2016 at 13:23, Simon Riggs  wrote:
>
>> Fix it up and I'll commit. Thanks for the report.
>
> I was hoping for some more effort from Ants to correct this.
>
> I'll commit Craig's new tests for hs feedback before this, so it can
> go in with a Tap test, so I imagine we're about a week away from
> commit on this.
>
>
> I posted a revised version of Ants's patch that passes the shell script
> test.
>
> A TAP  test would likely make sene though, I agree.


Ants, do you think you'll have a chance to convert your shell script
test into a TAP test in src/test/recovery?

Simon has said he would like to commit this fix. I'd personally be
happier if it had a test to go with it.

You could probably just add to src/test/recover/t/001 which now
contains my additions for hot standby.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication slot xmin is not reset if HS feedback is turned off while standby is shut down

2017-01-02 Thread Craig Ringer
On 2 Jan. 2017 20:20, "Simon Riggs"  wrote:

On 21 December 2016 at 13:23, Simon Riggs  wrote:

> Fix it up and I'll commit. Thanks for the report.

I was hoping for some more effort from Ants to correct this.

I'll commit Craig's new tests for hs feedback before this, so it can
go in with a Tap test, so I imagine we're about a week away from
commit on this.


I posted a revised version of Ants's patch that passes the shell script
test.

A TAP  test would likely make sene though, I agree.


Re: [HACKERS] Replication slot xmin is not reset if HS feedback is turned off while standby is shut down

2017-01-02 Thread Simon Riggs
On 21 December 2016 at 13:23, Simon Riggs  wrote:

> Fix it up and I'll commit. Thanks for the report.

I was hoping for some more effort from Ants to correct this.

I'll commit Craig's new tests for hs feedback before this, so it can
go in with a Tap test, so I imagine we're about a week away from
commit on this.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication slot xmin is not reset if HS feedback is turned off while standby is shut down

2016-12-21 Thread Craig Ringer
On 21 December 2016 at 21:23, Simon Riggs  wrote:

> Valid bug, but this still ain't right. We don't want to turn feedback
> on until HS is active, but we do want to turn it off once whether or
> not HS is active yet.

I just posted an update, though I forgot to add you to the direct Cc
list. It passes the provided test script.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication slot xmin is not reset if HS feedback is turned off while standby is shut down

2016-12-21 Thread Craig Ringer
On 21 December 2016 at 21:03, Ants Aasma  wrote:

> There was a !HotStandbyActive() check there previously, I was not sure
> if it was only to avoid sending useless messages or was necessary
> because something was not initialized otherwise. Looks like it is not
> needed and can be removed. Revised patch that does so attached.

Ah, see, I'm blind. Too much multi-tasking. Turns out patch review
with a toddler's help is hard ;)

HotStandbyActive() call is actually just checking if we're still in
crash or archive recovery, per

/*
 * SharedHotStandbyActive indicates if we're still in crash or archive
 * recovery.  Protected by info_lck.
 */
boolSharedHotStandbyActive;


so it is appropriate to defer any hot standby feedback until then. By
that point we've definitely finished setting up replication slots for
one thing, and know for sure if we have something useful to say and
won't send the wrong thing.

It looks to me like we should continue to bail out if !HotStandbyActive() . The

Assert(!master_has_standby_xmin)

can go, since there's now a valid case where master_has_standby_xmin
can be true before HotStandbyActive() is true.

Here's a revised version. I haven't run it against your tests yet.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From 5d86d42e30ccb3dd3e1b227b102384762d66e9dd Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Wed, 21 Dec 2016 21:28:32 +0800
Subject: [PATCH] Reset xmin if hot_standby_feedback is turned off while
 shutdown

If the user turns hot_standby_feedback off while we're shut down or the
walreciever is restarting, we won't remember that we sent an xmin to the
server. This never used to matter since the xmin was tied to the walsender
backend PGPROC, but now that we have replication slots it's persistent across
connections.

Always send at least one hot_standby_feedback message after walreceiver
startup, even if hot_standby_feedback is off, to ensure any saved
slot xmin from a prior connection is cleared.

Ants Aasma, Craig Ringer
---
 src/backend/replication/walreceiver.c | 21 ++---
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index cc3cf7d..11de996 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -1157,7 +1157,10 @@ XLogWalRcvSendReply(bool force, bool requestReply)
  * in case they don't have a watch.
  *
  * If the user disables feedback, send one final message to tell sender
- * to forget about the xmin on this standby.
+ * to forget about the xmin on this standby. We also send this message
+ * on first connect because a previous connection might have set xmin
+ * on a replication slot. (If we're not using a slot it's harmless to
+ * send a feedback message explicitly setting InvalidTransactionId).
  */
 static void
 XLogWalRcvSendHSFeedback(bool immed)
@@ -1167,7 +1170,8 @@ XLogWalRcvSendHSFeedback(bool immed)
 	uint32		nextEpoch;
 	TransactionId xmin;
 	static TimestampTz sendTime = 0;
-	static bool master_has_standby_xmin = false;
+	/* initially true so we always send at least one feedback message */
+	static bool master_has_standby_xmin = true;
 
 	/*
 	 * If the user doesn't want status to be reported to the master, be sure
@@ -1192,14 +1196,17 @@ XLogWalRcvSendHSFeedback(bool immed)
 	}
 
 	/*
-	 * If Hot Standby is not yet active there is nothing to send. Check this
-	 * after the interval has expired to reduce number of calls.
+	 * If Hot Standby is not yet accepting connections there is nothing to
+	 * send. Check this after the interval has expired to reduce number of
+	 * calls.
+	 *
+	 * Bailing out here also ensures that we don't send feedback until we've
+	 * read our own replication slot state, so we don't tell the master to
+	 * discard needed xmin or catalog_xmin from any slots that may exist
+	 * on this replica.
 	 */
 	if (!HotStandbyActive())
-	{
-		Assert(!master_has_standby_xmin);
 		return;
-	}
 
 	/*
 	 * Make the expensive call to get the oldest xmin once we are certain
-- 
2.5.5


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication slot xmin is not reset if HS feedback is turned off while standby is shut down

2016-12-21 Thread Simon Riggs
On 21 December 2016 at 13:03, Ants Aasma  wrote:
> On Wed, Dec 21, 2016 at 2:09 PM, Craig Ringer  wrote:
>> On 21 December 2016 at 15:40, Ants Aasma  wrote:
>>
 So -1 on this part of the patch, unless there's something I've 
 misunderstood.
>>>
>>> Currently there was no feedback sent if hot standby was not active. I
>>> was not sure if it was safe to call GetOldestXmin() in that case.
>>> However I did not consider cascading replica slots wanting to hold
>>> back xmin, where resetting the parents xmin is indeed wrong. Do you
>>> know if GetOldestXmin() is safe at this point and we can just remove
>>> the HotStandbyActive() check? Otherwise I think the correct approach
>>> is to move the check and return inside the hot_standby_feedback case
>>> like this:
>>>
>>> if (hot_standby_feedback)
>>> {
>>> if (!HotStandbyActive())
>>>return;
>>
>> I feel like I'm missing something obvious here. If we force sending
>> hot standby feedback at least once, by assuming
>> master_has_standby_xmin = true at startup, why isn't that sufficient?
>> We'll send InvalidTransactionId if hot_standby_feedback is off. Isn't
>> that the point?
>>
>> It's safe to call GetOldestXmin pretty much as soon as we load the
>> recovery start checkpoint. It won't consider the state of replication
>> slots until later in startup, but that's a pre-existing flaw that
>> should be addressed separately.
>>
>> Why do we need to do more than master_has_standby_xmin = true ?
>
> There was a !HotStandbyActive() check there previously, I was not sure
> if it was only to avoid sending useless messages or was necessary
> because something was not initialized otherwise. Looks like it is not
> needed and can be removed. Revised patch that does so attached.

Valid bug, but this still ain't right. We don't want to turn feedback
on until HS is active, but we do want to turn it off once whether or
not HS is active yet.

We need a full detailed comment explaining this.

Fix it up and I'll commit. Thanks for the report.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication slot xmin is not reset if HS feedback is turned off while standby is shut down

2016-12-21 Thread Ants Aasma
On Wed, Dec 21, 2016 at 2:09 PM, Craig Ringer  wrote:
> On 21 December 2016 at 15:40, Ants Aasma  wrote:
>
>>> So -1 on this part of the patch, unless there's something I've 
>>> misunderstood.
>>
>> Currently there was no feedback sent if hot standby was not active. I
>> was not sure if it was safe to call GetOldestXmin() in that case.
>> However I did not consider cascading replica slots wanting to hold
>> back xmin, where resetting the parents xmin is indeed wrong. Do you
>> know if GetOldestXmin() is safe at this point and we can just remove
>> the HotStandbyActive() check? Otherwise I think the correct approach
>> is to move the check and return inside the hot_standby_feedback case
>> like this:
>>
>> if (hot_standby_feedback)
>> {
>> if (!HotStandbyActive())
>>return;
>
> I feel like I'm missing something obvious here. If we force sending
> hot standby feedback at least once, by assuming
> master_has_standby_xmin = true at startup, why isn't that sufficient?
> We'll send InvalidTransactionId if hot_standby_feedback is off. Isn't
> that the point?
>
> It's safe to call GetOldestXmin pretty much as soon as we load the
> recovery start checkpoint. It won't consider the state of replication
> slots until later in startup, but that's a pre-existing flaw that
> should be addressed separately.
>
> Why do we need to do more than master_has_standby_xmin = true ?

There was a !HotStandbyActive() check there previously, I was not sure
if it was only to avoid sending useless messages or was necessary
because something was not initialized otherwise. Looks like it is not
needed and can be removed. Revised patch that does so attached.

Regards,
Ants Aasma
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index cc3cf7d..84ffa91 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -1157,7 +1157,9 @@ XLogWalRcvSendReply(bool force, bool requestReply)
  * in case they don't have a watch.
  *
  * If the user disables feedback, send one final message to tell sender
- * to forget about the xmin on this standby.
+ * to forget about the xmin on this standby. We also send this message
+ * on first connect because a previous connection might have set xmin
+ * on a replication slot.
  */
 static void
 XLogWalRcvSendHSFeedback(bool immed)
@@ -1167,7 +1169,8 @@ XLogWalRcvSendHSFeedback(bool immed)
 	uint32		nextEpoch;
 	TransactionId xmin;
 	static TimestampTz sendTime = 0;
-	static bool master_has_standby_xmin = false;
+	/* initially true so we always send at least one feedback message */
+	static bool master_has_standby_xmin = true;
 
 	/*
 	 * If the user doesn't want status to be reported to the master, be sure
@@ -1192,16 +1195,6 @@ XLogWalRcvSendHSFeedback(bool immed)
 	}
 
 	/*
-	 * If Hot Standby is not yet active there is nothing to send. Check this
-	 * after the interval has expired to reduce number of calls.
-	 */
-	if (!HotStandbyActive())
-	{
-		Assert(!master_has_standby_xmin);
-		return;
-	}
-
-	/*
 	 * Make the expensive call to get the oldest xmin once we are certain
 	 * everything else has been checked.
 	 */

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication slot xmin is not reset if HS feedback is turned off while standby is shut down

2016-12-21 Thread Craig Ringer
On 21 December 2016 at 15:40, Ants Aasma  wrote:

>> So -1 on this part of the patch, unless there's something I've misunderstood.
>
> Currently there was no feedback sent if hot standby was not active. I
> was not sure if it was safe to call GetOldestXmin() in that case.
> However I did not consider cascading replica slots wanting to hold
> back xmin, where resetting the parents xmin is indeed wrong. Do you
> know if GetOldestXmin() is safe at this point and we can just remove
> the HotStandbyActive() check? Otherwise I think the correct approach
> is to move the check and return inside the hot_standby_feedback case
> like this:
>
> if (hot_standby_feedback)
> {
> if (!HotStandbyActive())
>return;

I feel like I'm missing something obvious here. If we force sending
hot standby feedback at least once, by assuming
master_has_standby_xmin = true at startup, why isn't that sufficient?
We'll send InvalidTransactionId if hot_standby_feedback is off. Isn't
that the point?

It's safe to call GetOldestXmin pretty much as soon as we load the
recovery start checkpoint. It won't consider the state of replication
slots until later in startup, but that's a pre-existing flaw that
should be addressed separately.

Why do we need to do more than master_has_standby_xmin = true ?

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication slot xmin is not reset if HS feedback is turned off while standby is shut down

2016-12-20 Thread Ants Aasma
On Wed, Dec 21, 2016 at 4:14 AM, Craig Ringer  wrote:
> Re the patch, I don't like
>
> -   static bool master_has_standby_xmin = false;
> +   static bool master_has_standby_xmin = true;
>
> without any comment. It's addressed in the comment changes on the
> header func, but the link isn't obvious. Maybe a oneliner to say
> "ensure we always send at least one feedback message" ?

Will fix.

> I think this part of the patch is correct and useful.
>
>
>
> I don't see the point of
>
> +*
> +* If Hot Standby is not yet active we reset the xmin value. Check 
> this
> +* after the interval has expired to reduce number of calls.
>  */
> -   if (hot_standby_feedback)
> +   if (hot_standby_feedback && HotStandbyActive())
>
> though. Forcing feedback to send InvalidTransactionId until hot
> standby feedback is actively running seems counterproductive; we want
> to lock in feedback as soon as possible, not wait until we're
> accepting transactions. Simon and I are in fact working on changes to
> do the opposite of what you've got here and ensure that we don't allow
> hot standby connections until we know hot_standby_feedback is in
> effect and a usable xmin is locked in. That way the master won't
> remove tuples we need as soon as we start an xact and cause a conflict
> with recovery.
>
> If there are no running xacts, GetOldestXmin() will return the slot
> xmin if any. We should definitely not be clearing that just because
> we're not accepting hot standby connections yet; we want to make very
> sure it remains in effect unless the user explicitly de-configures hot
> standby.
>
>  (There's another pre-exisitng problem there, where we can start the
> walsender before slots are initialized, that I'll be addressing
> separately).
>
> If there's no slot xmin either, we'll send nextXid. That's a sensible
> starting point for hot standby feedback until we start some
> transactions.
>
> So -1 on this part of the patch, unless there's something I've misunderstood.

Currently there was no feedback sent if hot standby was not active. I
was not sure if it was safe to call GetOldestXmin() in that case.
However I did not consider cascading replica slots wanting to hold
back xmin, where resetting the parents xmin is indeed wrong. Do you
know if GetOldestXmin() is safe at this point and we can just remove
the HotStandbyActive() check? Otherwise I think the correct approach
is to move the check and return inside the hot_standby_feedback case
like this:

if (hot_standby_feedback)
{
if (!HotStandbyActive())
   return;

Regards,
Ants Aasma


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication slot xmin is not reset if HS feedback is turned off while standby is shut down

2016-12-20 Thread Michael Paquier
On Wed, Dec 21, 2016 at 11:14 AM, Craig Ringer  wrote:
> I didn't see this in the CF app so I created it in
> https://commitfest.postgresql.org/12/ as
> https://commitfest.postgresql.org/12/924/ . I couldn't find your
> PostgreSQL community username, so please log in and set yourself as
> author. I have set patch state to "waiting on author" pending your
> addressing these remarks.

You need to import the user first, which is what I did and I have
added the author name correctly.
(Glad you jumped on this patch, I was just going to begin a lookup. So
that's a nice timing.)
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Replication slot xmin is not reset if HS feedback is turned off while standby is shut down

2016-12-20 Thread Craig Ringer
On 21 December 2016 at 00:52, Ants Aasma  wrote:

> The simple fix seems to be to always send out at least one feedback
> message on each connect regardless of hot_standby_feedback setting.

I agree.

> Patch attached. Looks like this goes back to version 9.4. It could
> conceivably cause issues for replication middleware that does not know
> how to handle hot standby feedback messages. Not sure if any exist and
> if that is a concern.

If it exists it's already broken. Not our problem IMO.

> A shell script to reproduce the problem is also attached, adjust the
> PGPATH variable to your postgres install and run in an empty
> directory.

I wrote some TAP tests for 9.6 (they're on the logical decoding on
standby thread) that can be extended to check this. Once they're
committed we can add a test for this change.

Re the patch, I don't like

-   static bool master_has_standby_xmin = false;
+   static bool master_has_standby_xmin = true;

without any comment. It's addressed in the comment changes on the
header func, but the link isn't obvious. Maybe a oneliner to say
"ensure we always send at least one feedback message" ?

I think this part of the patch is correct and useful.



I don't see the point of

+*
+* If Hot Standby is not yet active we reset the xmin value. Check this
+* after the interval has expired to reduce number of calls.
 */
-   if (hot_standby_feedback)
+   if (hot_standby_feedback && HotStandbyActive())

though. Forcing feedback to send InvalidTransactionId until hot
standby feedback is actively running seems counterproductive; we want
to lock in feedback as soon as possible, not wait until we're
accepting transactions. Simon and I are in fact working on changes to
do the opposite of what you've got here and ensure that we don't allow
hot standby connections until we know hot_standby_feedback is in
effect and a usable xmin is locked in. That way the master won't
remove tuples we need as soon as we start an xact and cause a conflict
with recovery.

If there are no running xacts, GetOldestXmin() will return the slot
xmin if any. We should definitely not be clearing that just because
we're not accepting hot standby connections yet; we want to make very
sure it remains in effect unless the user explicitly de-configures hot
standby.

 (There's another pre-exisitng problem there, where we can start the
walsender before slots are initialized, that I'll be addressing
separately).

If there's no slot xmin either, we'll send nextXid. That's a sensible
starting point for hot standby feedback until we start some
transactions.

So -1 on this part of the patch, unless there's something I've misunderstood.


I didn't see this in the CF app so I created it in
https://commitfest.postgresql.org/12/ as
https://commitfest.postgresql.org/12/924/ . I couldn't find your
PostgreSQL community username, so please log in and set yourself as
author. I have set patch state to "waiting on author" pending your
addressing these remarks.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Replication slot xmin is not reset if HS feedback is turned off while standby is shut down

2016-12-20 Thread Ants Aasma
I just had a client issue with table bloat that I traced back to a
stale xmin value in a replication slot. xmin value from hot standby
feedback is stored in replication slot and used for vacuum xmin
calculation. If hot standby feedback is turned off while walreceiver
is active then the xmin gets reset by HS feedback message containing
InvalidTransactionId. However, if feedback gets turned off while
standby is shut down this message never gets sent and a stale value
gets left behind in the replication slot holding back vacuum.

The simple fix seems to be to always send out at least one feedback
message on each connect regardless of hot_standby_feedback setting.
Patch attached. Looks like this goes back to version 9.4. It could
conceivably cause issues for replication middleware that does not know
how to handle hot standby feedback messages. Not sure if any exist and
if that is a concern.

A shell script to reproduce the problem is also attached, adjust the
PGPATH variable to your postgres install and run in an empty
directory.

Regards,
Ants Aasma
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index cc3cf7d..31333ec 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -1157,7 +1157,9 @@ XLogWalRcvSendReply(bool force, bool requestReply)
  * in case they don't have a watch.
  *
  * If the user disables feedback, send one final message to tell sender
- * to forget about the xmin on this standby.
+ * to forget about the xmin on this standby. We also send this message
+ * on first connect because a previous connection might have set xmin
+ * on a replication slot.
  */
 static void
 XLogWalRcvSendHSFeedback(bool immed)
@@ -1167,7 +1169,7 @@ XLogWalRcvSendHSFeedback(bool immed)
 	uint32		nextEpoch;
 	TransactionId xmin;
 	static TimestampTz sendTime = 0;
-	static bool master_has_standby_xmin = false;
+	static bool master_has_standby_xmin = true;
 
 	/*
 	 * If the user doesn't want status to be reported to the master, be sure
@@ -1192,20 +1194,13 @@ XLogWalRcvSendHSFeedback(bool immed)
 	}
 
 	/*
-	 * If Hot Standby is not yet active there is nothing to send. Check this
-	 * after the interval has expired to reduce number of calls.
-	 */
-	if (!HotStandbyActive())
-	{
-		Assert(!master_has_standby_xmin);
-		return;
-	}
-
-	/*
 	 * Make the expensive call to get the oldest xmin once we are certain
 	 * everything else has been checked.
+	 *
+	 * If Hot Standby is not yet active we reset the xmin value. Check this
+	 * after the interval has expired to reduce number of calls.
 	 */
-	if (hot_standby_feedback)
+	if (hot_standby_feedback && HotStandbyActive())
 		xmin = GetOldestXmin(NULL, false);
 	else
 		xmin = InvalidTransactionId;


slot-xmin-not-reset-reproduce.sh
Description: Bourne shell script

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers