Re: [HACKERS] [PATCH] PostgresNode.pm enhancements, pg_lsn helper, and some more recovery tests

2017-01-04 Thread Simon Riggs
On 5 January 2017 at 04:50, Michael Paquier  wrote:

> The perldoc documentation is broken for the new routines.
...
> Attached is a patch to fix all those small issues.

Thanks, looks good, will apply.


> It may be a good idea to run perltidy on top of that to be honest...

Should we add that to the makefile? I guess start a new thread if you think so.

Anything that helps me check its correct is a good thing AFAICS.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] PostgresNode.pm enhancements, pg_lsn helper, and some more recovery tests

2017-01-04 Thread Michael Paquier
On Thu, Jan 5, 2017 at 1:58 AM, Simon Riggs  wrote:
> On 4 January 2017 at 08:16, Craig Ringer  wrote:
>> When committed I will update the decoding on standby series to omit
>> these pre-requisite patches.
>
> Committed, thanks.

I was planning a round of reviews of those patches, but you were faster than me.

So now looking at what has been committed.

The perldoc documentation is broken for the new routines. The commits
have added patterns like that:
=pod $node->routine_name
But what needs to be done is to use =pod and =item, like that:
=pod
=item $node->routine_name

+Look up xlog positions on the server:
This has better be "*WAL* positions". There is another reference with
xlog (see recent threads about renaming those functions for example).

+* insert position (master only, error on replica)
+* write position (master only, error on replica)
+* flush position
+* receive position (always undef on master)
+* replay position
Replay position is always undefined on primary, let's document it in
the description of the routine. And flush position generates an error
for a standby.

The documentation of $node->lsn is generated like that, with a very
unfriendly list of modes:
$node->lsn(mode)
Look up xlog positions on the server:

* insert position (master only, error on replica) * write position
(master only, error on replica) * flush position * receive position
(always undef on master) * replay position
A trick that I have found here is to add a space before the '*'.

It may be a good idea to run perltidy on top of that to be honest...

Attached is a patch to fix all those small issues.
-- 
Michael
diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index 2a4ceb3d42..053c5ea787 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -1324,15 +1324,17 @@ sub run_log
TestLib::run_log(@_);
 }
 
-=pod $node->lsn(mode)
+=pod
+
+=item $node->lsn(mode)
 
-Look up xlog positions on the server:
+Look up WAL positions on the server:
 
-* insert position (master only, error on replica)
-* write position (master only, error on replica)
-* flush position
-* receive position (always undef on master)
-* replay position
+ * insert position (master only, error on replica)
+ * write position (master only, error on replica)
+ * flush position (master only, error on replica)
+ * receive position (always undef on master)
+ * replay position (always undef on master)
 
 mode must be specified.
 
@@ -1363,11 +1365,13 @@ sub lsn
}
 }
 
-=pod $node->wait_for_catchup(standby_name, mode, target_lsn)
+=pod
+
+=item $node->wait_for_catchup(standby_name, mode, target_lsn)
 
 Wait for the node with application_name standby_name (usually from node->name)
 until its replication position in pg_stat_replication equals or passes the
-upstream's xlog insert point at the time this function is called. By default
+upstream's WAL insert point at the time this function is called. By default
 the replay_location is waited for, but 'mode' may be specified to wait for any
 of sent|write|flush|replay.
 
@@ -1401,7 +1405,9 @@ sub wait_for_catchup
print "done\n";
 }
 
-=pod $node->wait_for_slot_catchup(slot_name, mode, target_lsn)
+=pod
+
+=item $node->wait_for_slot_catchup(slot_name, mode, target_lsn)
 
 Wait for the named replication slot to equal or pass the supplied target_lsn.
 The position used is the restart_lsn unless mode is given, in which case it may
@@ -1435,7 +1441,9 @@ sub wait_for_slot_catchup
print "done\n";
 }
 
-=pod $node->query_hash($dbname, $query, @columns)
+=pod
+
+=item $node->query_hash($dbname, $query, @columns)
 
 Execute $query on $dbname, replacing any appearance of the string __COLUMNS__
 within the query with a comma-separated list of @columns.
@@ -1473,7 +1481,9 @@ sub query_hash
return \%val;
 }
 
-=pod $node->slot(slot_name)
+=pod
+
+=item $node->slot(slot_name)
 
 Return hash-ref of replication slot data for the named slot, or a hash-ref with
 all values '' if not found. Does not differentiate between null and empty 
string

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] PostgresNode.pm enhancements, pg_lsn helper, and some more recovery tests

2017-01-04 Thread Simon Riggs
On 4 January 2017 at 08:16, Craig Ringer  wrote:
> On 4 January 2017 at 12:39, Craig Ringer  wrote:
>
>> To keep things together, I've followed up on the logical decoding on
>> standby thread that now incorporates all these patches.
>
> Attached are the two patches discussed upthread, in their
> proposed-for-commit form, as requested by Simon.
>
> These correspond to patches 0001 and 0004 of the logical decoding on
> standby series at
> https://www.postgresql.org/message-id/CAMsr+YEzC=-+eV09A=ra150fjtkmtqt5q70piqbwytbor3c...@mail.gmail.com
> .
>
> This subset is tracked as https://commitfest.postgresql.org/12/883/ .
>
> When committed I will update the decoding on standby series to omit
> these pre-requisite patches.

Committed, thanks.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] PostgresNode.pm enhancements, pg_lsn helper, and some more recovery tests

2017-01-04 Thread Craig Ringer
On 4 January 2017 at 12:39, Craig Ringer  wrote:

> To keep things together, I've followed up on the logical decoding on
> standby thread that now incorporates all these patches.

Attached are the two patches discussed upthread, in their
proposed-for-commit form, as requested by Simon.

These correspond to patches 0001 and 0004 of the logical decoding on
standby series at
https://www.postgresql.org/message-id/CAMsr+YEzC=-+eV09A=ra150fjtkmtqt5q70piqbwytbor3c...@mail.gmail.com
.

This subset is tracked as https://commitfest.postgresql.org/12/883/ .

When committed I will update the decoding on standby series to omit
these pre-requisite patches.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From 79241c8052fbd1ecd079e98fd8564e4b2fcf797b Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Mon, 14 Nov 2016 12:27:17 +0800
Subject: [PATCH 1/2] PostgresNode methods to wait for node catchup

---
 src/test/perl/PostgresNode.pm  | 172 -
 src/test/recovery/t/001_stream_rep.pl  |  12 +-
 src/test/recovery/t/004_timeline_switch.pl |  16 +--
 3 files changed, 175 insertions(+), 25 deletions(-)

diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index c1b16ca..2f009d4 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -1121,7 +1121,6 @@ sub psql
 		my $exc_save = $@;
 		if ($exc_save)
 		{
-
 			# IPC::Run::run threw an exception. re-throw unless it's a
 			# timeout, which we'll handle by testing is_expired
 			die $exc_save
@@ -1173,7 +1172,7 @@ sub psql
 		  if $ret == 1;
 		die "connection error: '$$stderr'\nwhile running '@psql_params'"
 		  if $ret == 2;
-		die "error running SQL: '$$stderr'\nwhile running '@psql_params'"
+		die "error running SQL: '$$stderr'\nwhile running '@psql_params' with sql '$sql'"
 		  if $ret == 3;
 		die "psql returns $ret: '$$stderr'\nwhile running '@psql_params'";
 	}
@@ -1325,6 +1324,175 @@ sub run_log
 	TestLib::run_log(@_);
 }
 
+=pod $node->lsn(mode)
+
+Look up xlog positions on the server:
+
+* insert position (master only, error on replica)
+* write position (master only, error on replica)
+* flush position
+* receive position (always undef on master)
+* replay position
+
+mode must be specified.
+
+=cut
+
+sub lsn
+{
+	my ($self, $mode) = @_;
+	my %modes = ('insert' => 'pg_current_xlog_insert_location()',
+ 'flush' => 'pg_current_xlog_flush_location()',
+ 'write' => 'pg_current_xlog_location()',
+ 'receive' => 'pg_last_xlog_receive_location()',
+ 'replay' => 'pg_last_xlog_replay_location()');
+
+	$mode = '' if !defined($mode);
+	die "unknown mode for 'lsn': '$mode', valid modes are " . join(', ', keys %modes)
+		if !defined($modes{$mode});
+
+	my $result = $self->safe_psql('postgres', "SELECT $modes{$mode}");
+	chomp($result);
+	if ($result eq '')
+	{
+		return undef;
+	}
+	else
+	{
+		return $result;
+	}
+}
+
+=pod $node->wait_for_catchup(standby_name, mode, target_lsn)
+
+Wait for the node with application_name standby_name (usually from node->name)
+until its replication position in pg_stat_replication equals or passes the
+upstream's xlog insert point at the time this function is called. By default
+the replay_location is waited for, but 'mode' may be specified to wait for any
+of sent|write|flush|replay.
+
+If there is no active replication connection from this peer, waits until
+poll_query_until timeout.
+
+Requires that the 'postgres' db exists and is accessible.
+
+target_lsn may be any arbitrary lsn, but is typically $master_node->lsn('insert').
+
+This is not a test. It die()s on failure.
+
+=cut
+
+sub wait_for_catchup
+{
+	my ($self, $standby_name, $mode, $target_lsn) = @_;
+	$mode = defined($mode) ? $mode : 'replay';
+	my %valid_modes = ( 'sent' => 1, 'write' => 1, 'flush' => 1, 'replay' => 1 );
+	die "unknown mode $mode for 'wait_for_catchup', valid modes are " . join(', ', keys(%valid_modes)) unless exists($valid_modes{$mode});
+	# Allow passing of a PostgresNode instance as shorthand
+	if ( blessed( $standby_name ) && $standby_name->isa("PostgresNode") )
+	{
+		$standby_name = $standby_name->name;
+	}
+	die 'target_lsn must be specified' unless defined($target_lsn);
+	print "Waiting for replication conn " . $standby_name . "'s " . $mode . "_location to pass " . $target_lsn . " on " . $self->name . "\n";
+	my $query = qq[SELECT '$target_lsn' <= ${mode}_location FROM pg_catalog.pg_stat_replication WHERE application_name = '$standby_name';];
+	$self->poll_query_until('postgres', $query)
+		or die "timed out waiting for catchup, current position is " . ($self->safe_psql('postgres', $query) || '(unknown)');
+	print "done\n";
+}
+
+=pod $node->wait_for_slot_catchup(slot_name, mode, target_lsn)
+
+Wait for the named replication slot to equal or pass the supplied target_lsn.
+The position used is the restart_lsn unless mode is given, in which case it may
+be 'restart' or 'confirmed_flu

Re: [HACKERS] [PATCH] PostgresNode.pm enhancements, pg_lsn helper, and some more recovery tests

2017-01-03 Thread Craig Ringer
On 3 January 2017 at 12:36, Craig Ringer  wrote:
> On 2 January 2017 at 20:17, Simon Riggs  wrote:
>
>> Bit confused... can't see a caller for wait_for_slot_catchup() and the
>> slot tests don't call it AFAICS.
>
> The recovery tests for decoding on standby will use it. I can delay
> adding it until then.
>
>> Also can't see anywhere the LSN stuff is used either.
>
> Removed after discussion with Michael in the logical decoding on standby 
> thread.
>
> I'll be posting a new patch series there shortly, which removes
> pg_lsn.pm and modifies the PostgresNode.pm changes per discussion. If
> you're able to commit the updated PostgresNode.pm and new standby
> tests that'd be very handy.

To keep things together, I've followed up on the logical decoding on
standby thread that now incorporates all these patches.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] PostgresNode.pm enhancements, pg_lsn helper, and some more recovery tests

2017-01-02 Thread Craig Ringer
On 2 January 2017 at 20:17, Simon Riggs  wrote:

> Bit confused... can't see a caller for wait_for_slot_catchup() and the
> slot tests don't call it AFAICS.

The recovery tests for decoding on standby will use it. I can delay
adding it until then.

> Also can't see anywhere the LSN stuff is used either.

Removed after discussion with Michael in the logical decoding on standby thread.

I'll be posting a new patch series there shortly, which removes
pg_lsn.pm and modifies the PostgresNode.pm changes per discussion. If
you're able to commit the updated PostgresNode.pm and new standby
tests that'd be very handy.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] PostgresNode.pm enhancements, pg_lsn helper, and some more recovery tests

2017-01-02 Thread Simon Riggs
On 14 November 2016 at 07:01, Craig Ringer  wrote:
> Every test I write with the TAP framework for recovery seems to need
> to wait for one node to catch up to another or examine replication
> slot state. So: attached is a patch to add helper methods for these
> tasks.
>
> Also add a new pg_lsn.pm helper class to deal with PostgreSQL's
> awkward representation of LSNs in perl. Made extra fun by our use of
> Perl 5.8.8 with no new modules, so we can't rely on having 64-bit
> integers. Provides sensible LSN comparisons. I'll be using this in
> coming patches that have to make assertions about differences between
> LSNs, and I think it's sensible to have anyway. Incorporates tests for
> the class.
>
> Finally, add some coverage of physical replication slots to recovery tests.
>
> Backpatch to 9.6 desirable, since we seem to be doing that for TAP
> infrastructure.
>
> These three are related enough, and all only touch the TAP framework,
> so I've bundled them in a series.

Good to see more tests going in.

Bit confused... can't see a caller for wait_for_slot_catchup() and the
slot tests don't call it AFAICS.

Also can't see anywhere the LSN stuff is used either.

No specific problems with the code, just don't want to commit code
that isn't used anywhere, yet.

I want to commit the extra tests soon, as a stronger test for my
recovery.conf changes.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH] PostgresNode.pm enhancements, pg_lsn helper, and some more recovery tests

2016-11-13 Thread Craig Ringer
Every test I write with the TAP framework for recovery seems to need
to wait for one node to catch up to another or examine replication
slot state. So: attached is a patch to add helper methods for these
tasks.

Also add a new pg_lsn.pm helper class to deal with PostgreSQL's
awkward representation of LSNs in perl. Made extra fun by our use of
Perl 5.8.8 with no new modules, so we can't rely on having 64-bit
integers. Provides sensible LSN comparisons. I'll be using this in
coming patches that have to make assertions about differences between
LSNs, and I think it's sensible to have anyway. Incorporates tests for
the class.

Finally, add some coverage of physical replication slots to recovery tests.

Backpatch to 9.6 desirable, since we seem to be doing that for TAP
infrastructure.

These three are related enough, and all only touch the TAP framework,
so I've bundled them in a series.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From f01790bb3f47f7fabf64328f8c9a01e8f3cf837c Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Wed, 9 Nov 2016 13:44:04 +0800
Subject: [PATCH 3/3] Expand streaming replication tests to cover hot standby
 feedback and physical replication slots

---
 src/test/recovery/t/001_stream_rep.pl | 105 +-
 1 file changed, 104 insertions(+), 1 deletion(-)

diff --git a/src/test/recovery/t/001_stream_rep.pl b/src/test/recovery/t/001_stream_rep.pl
index 5ce69bb..ef29892 100644
--- a/src/test/recovery/t/001_stream_rep.pl
+++ b/src/test/recovery/t/001_stream_rep.pl
@@ -3,7 +3,7 @@ use strict;
 use warnings;
 use PostgresNode;
 use TestLib;
-use Test::More tests => 4;
+use Test::More tests => 22;
 
 # Initialize master node
 my $node_master = get_new_node('master');
@@ -58,3 +58,106 @@ is($node_standby_1->psql('postgres', 'INSERT INTO tab_int VALUES (1)'),
 	3, 'read-only queries on standby 1');
 is($node_standby_2->psql('postgres', 'INSERT INTO tab_int VALUES (1)'),
 	3, 'read-only queries on standby 2');
+
+diag "switching to physical replication slot";
+# Switch to using a physical replication slot. We can do this without a new
+# backup since physical slots can go backwards if needed. Do so on both
+# standbys. Since we're going to be testing things that affect the slot state,
+# also increase the standby feedback interval to ensure timely updates.
+my ($slotname_1, $slotname_2) = ('standby_1', 'standby_2');
+$node_master->append_conf('postgresql.conf', "max_replication_slots = 4\n");
+$node_master->restart;
+is($node_master->psql('postgres', qq[SELECT pg_create_physical_replication_slot('$slotname_1');]), 0, 'physical slot created on master');
+$node_standby_1->append_conf('recovery.conf', "primary_slot_name = $slotname_1\n");
+$node_standby_1->append_conf('postgresql.conf', "wal_receiver_status_interval = 1\n");
+$node_standby_1->append_conf('postgresql.conf', "max_replication_slots = 4\n");
+$node_standby_1->restart;
+is($node_standby_1->psql('postgres', qq[SELECT pg_create_physical_replication_slot('$slotname_2');]), 0, 'physical slot created on intermediate replica');
+$node_standby_2->append_conf('recovery.conf', "primary_slot_name = $slotname_2\n");
+$node_standby_2->append_conf('postgresql.conf', "wal_receiver_status_interval = 1\n");
+$node_standby_2->restart;
+
+sub get_slot_xmins
+{
+	my ($node, $slotname) = @_;
+	my $slotinfo = $node->slot($slotname);
+	return ($slotinfo->{'xmin'}, $slotinfo->{'catalog_xmin'});
+}
+
+# There's no hot standby feedback and there are no logical slots on either peer
+# so xmin and catalog_xmin should be null on both slots.
+my ($xmin, $catalog_xmin) = get_slot_xmins($node_master, $slotname_1);
+is($xmin, '', 'non-cascaded slot xmin null with no hs_feedback');
+is($catalog_xmin, '', 'non-cascaded slot xmin null with no hs_feedback');
+
+($xmin, $catalog_xmin) = get_slot_xmins($node_standby_1, $slotname_2);
+is($xmin, '', 'cascaded slot xmin null with no hs_feedback');
+is($catalog_xmin, '', 'cascaded slot xmin null with no hs_feedback');
+
+# Replication still works?
+$node_master->safe_psql('postgres', 'CREATE TABLE replayed(val integer);');
+
+sub replay_check
+{
+	my $newval = $node_master->safe_psql('postgres', 'INSERT INTO replayed(val) SELECT coalesce(max(val),0) + 1 AS newval FROM replayed RETURNING val');
+	$node_master->wait_for_catchup($node_standby_1);
+	$node_standby_1->wait_for_catchup($node_standby_2);
+	$node_standby_1->safe_psql('postgres', qq[SELECT 1 FROM replayed WHERE val = $newval])
+		or die "standby_1 didn't replay master value $newval";
+	$node_standby_2->safe_psql('postgres', qq[SELECT 1 FROM replayed WHERE val = $newval])
+		or die "standby_2 didn't replay standby_1 value $newval";
+}
+
+replay_check();
+
+diag "enabling hot_standby_feedback";
+# Enable hs_feedback. The slot should gain an xmin. We set the status interval
+# so we'll see the results promptly.
+$node_standby_1->safe_psql('postgres', 'ALTER SYSTEM SET ho