Re: [HACKERS] [PATCH] PostgresNode.pm enhancements, pg_lsn helper, and some more recovery tests
On 5 January 2017 at 04:50, Michael Paquier wrote: > The perldoc documentation is broken for the new routines. ... > Attached is a patch to fix all those small issues. Thanks, looks good, will apply. > It may be a good idea to run perltidy on top of that to be honest... Should we add that to the makefile? I guess start a new thread if you think so. Anything that helps me check its correct is a good thing AFAICS. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] PostgresNode.pm enhancements, pg_lsn helper, and some more recovery tests
On Thu, Jan 5, 2017 at 1:58 AM, Simon Riggs wrote: > On 4 January 2017 at 08:16, Craig Ringer wrote: >> When committed I will update the decoding on standby series to omit >> these pre-requisite patches. > > Committed, thanks. I was planning a round of reviews of those patches, but you were faster than me. So now looking at what has been committed. The perldoc documentation is broken for the new routines. The commits have added patterns like that: =pod $node->routine_name But what needs to be done is to use =pod and =item, like that: =pod =item $node->routine_name +Look up xlog positions on the server: This has better be "*WAL* positions". There is another reference with xlog (see recent threads about renaming those functions for example). +* insert position (master only, error on replica) +* write position (master only, error on replica) +* flush position +* receive position (always undef on master) +* replay position Replay position is always undefined on primary, let's document it in the description of the routine. And flush position generates an error for a standby. The documentation of $node->lsn is generated like that, with a very unfriendly list of modes: $node->lsn(mode) Look up xlog positions on the server: * insert position (master only, error on replica) * write position (master only, error on replica) * flush position * receive position (always undef on master) * replay position A trick that I have found here is to add a space before the '*'. It may be a good idea to run perltidy on top of that to be honest... Attached is a patch to fix all those small issues. -- Michael diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm index 2a4ceb3d42..053c5ea787 100644 --- a/src/test/perl/PostgresNode.pm +++ b/src/test/perl/PostgresNode.pm @@ -1324,15 +1324,17 @@ sub run_log TestLib::run_log(@_); } -=pod $node->lsn(mode) +=pod + +=item $node->lsn(mode) -Look up xlog positions on the server: +Look up WAL positions on the server: -* insert position (master only, error on replica) -* write position (master only, error on replica) -* flush position -* receive position (always undef on master) -* replay position + * insert position (master only, error on replica) + * write position (master only, error on replica) + * flush position (master only, error on replica) + * receive position (always undef on master) + * replay position (always undef on master) mode must be specified. @@ -1363,11 +1365,13 @@ sub lsn } } -=pod $node->wait_for_catchup(standby_name, mode, target_lsn) +=pod + +=item $node->wait_for_catchup(standby_name, mode, target_lsn) Wait for the node with application_name standby_name (usually from node->name) until its replication position in pg_stat_replication equals or passes the -upstream's xlog insert point at the time this function is called. By default +upstream's WAL insert point at the time this function is called. By default the replay_location is waited for, but 'mode' may be specified to wait for any of sent|write|flush|replay. @@ -1401,7 +1405,9 @@ sub wait_for_catchup print "done\n"; } -=pod $node->wait_for_slot_catchup(slot_name, mode, target_lsn) +=pod + +=item $node->wait_for_slot_catchup(slot_name, mode, target_lsn) Wait for the named replication slot to equal or pass the supplied target_lsn. The position used is the restart_lsn unless mode is given, in which case it may @@ -1435,7 +1441,9 @@ sub wait_for_slot_catchup print "done\n"; } -=pod $node->query_hash($dbname, $query, @columns) +=pod + +=item $node->query_hash($dbname, $query, @columns) Execute $query on $dbname, replacing any appearance of the string __COLUMNS__ within the query with a comma-separated list of @columns. @@ -1473,7 +1481,9 @@ sub query_hash return \%val; } -=pod $node->slot(slot_name) +=pod + +=item $node->slot(slot_name) Return hash-ref of replication slot data for the named slot, or a hash-ref with all values '' if not found. Does not differentiate between null and empty string -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] PostgresNode.pm enhancements, pg_lsn helper, and some more recovery tests
On 4 January 2017 at 08:16, Craig Ringer wrote: > On 4 January 2017 at 12:39, Craig Ringer wrote: > >> To keep things together, I've followed up on the logical decoding on >> standby thread that now incorporates all these patches. > > Attached are the two patches discussed upthread, in their > proposed-for-commit form, as requested by Simon. > > These correspond to patches 0001 and 0004 of the logical decoding on > standby series at > https://www.postgresql.org/message-id/CAMsr+YEzC=-+eV09A=ra150fjtkmtqt5q70piqbwytbor3c...@mail.gmail.com > . > > This subset is tracked as https://commitfest.postgresql.org/12/883/ . > > When committed I will update the decoding on standby series to omit > these pre-requisite patches. Committed, thanks. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] PostgresNode.pm enhancements, pg_lsn helper, and some more recovery tests
On 4 January 2017 at 12:39, Craig Ringer wrote: > To keep things together, I've followed up on the logical decoding on > standby thread that now incorporates all these patches. Attached are the two patches discussed upthread, in their proposed-for-commit form, as requested by Simon. These correspond to patches 0001 and 0004 of the logical decoding on standby series at https://www.postgresql.org/message-id/CAMsr+YEzC=-+eV09A=ra150fjtkmtqt5q70piqbwytbor3c...@mail.gmail.com . This subset is tracked as https://commitfest.postgresql.org/12/883/ . When committed I will update the decoding on standby series to omit these pre-requisite patches. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services From 79241c8052fbd1ecd079e98fd8564e4b2fcf797b Mon Sep 17 00:00:00 2001 From: Craig Ringer Date: Mon, 14 Nov 2016 12:27:17 +0800 Subject: [PATCH 1/2] PostgresNode methods to wait for node catchup --- src/test/perl/PostgresNode.pm | 172 - src/test/recovery/t/001_stream_rep.pl | 12 +- src/test/recovery/t/004_timeline_switch.pl | 16 +-- 3 files changed, 175 insertions(+), 25 deletions(-) diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm index c1b16ca..2f009d4 100644 --- a/src/test/perl/PostgresNode.pm +++ b/src/test/perl/PostgresNode.pm @@ -1121,7 +1121,6 @@ sub psql my $exc_save = $@; if ($exc_save) { - # IPC::Run::run threw an exception. re-throw unless it's a # timeout, which we'll handle by testing is_expired die $exc_save @@ -1173,7 +1172,7 @@ sub psql if $ret == 1; die "connection error: '$$stderr'\nwhile running '@psql_params'" if $ret == 2; - die "error running SQL: '$$stderr'\nwhile running '@psql_params'" + die "error running SQL: '$$stderr'\nwhile running '@psql_params' with sql '$sql'" if $ret == 3; die "psql returns $ret: '$$stderr'\nwhile running '@psql_params'"; } @@ -1325,6 +1324,175 @@ sub run_log TestLib::run_log(@_); } +=pod $node->lsn(mode) + +Look up xlog positions on the server: + +* insert position (master only, error on replica) +* write position (master only, error on replica) +* flush position +* receive position (always undef on master) +* replay position + +mode must be specified. + +=cut + +sub lsn +{ + my ($self, $mode) = @_; + my %modes = ('insert' => 'pg_current_xlog_insert_location()', + 'flush' => 'pg_current_xlog_flush_location()', + 'write' => 'pg_current_xlog_location()', + 'receive' => 'pg_last_xlog_receive_location()', + 'replay' => 'pg_last_xlog_replay_location()'); + + $mode = '' if !defined($mode); + die "unknown mode for 'lsn': '$mode', valid modes are " . join(', ', keys %modes) + if !defined($modes{$mode}); + + my $result = $self->safe_psql('postgres', "SELECT $modes{$mode}"); + chomp($result); + if ($result eq '') + { + return undef; + } + else + { + return $result; + } +} + +=pod $node->wait_for_catchup(standby_name, mode, target_lsn) + +Wait for the node with application_name standby_name (usually from node->name) +until its replication position in pg_stat_replication equals or passes the +upstream's xlog insert point at the time this function is called. By default +the replay_location is waited for, but 'mode' may be specified to wait for any +of sent|write|flush|replay. + +If there is no active replication connection from this peer, waits until +poll_query_until timeout. + +Requires that the 'postgres' db exists and is accessible. + +target_lsn may be any arbitrary lsn, but is typically $master_node->lsn('insert'). + +This is not a test. It die()s on failure. + +=cut + +sub wait_for_catchup +{ + my ($self, $standby_name, $mode, $target_lsn) = @_; + $mode = defined($mode) ? $mode : 'replay'; + my %valid_modes = ( 'sent' => 1, 'write' => 1, 'flush' => 1, 'replay' => 1 ); + die "unknown mode $mode for 'wait_for_catchup', valid modes are " . join(', ', keys(%valid_modes)) unless exists($valid_modes{$mode}); + # Allow passing of a PostgresNode instance as shorthand + if ( blessed( $standby_name ) && $standby_name->isa("PostgresNode") ) + { + $standby_name = $standby_name->name; + } + die 'target_lsn must be specified' unless defined($target_lsn); + print "Waiting for replication conn " . $standby_name . "'s " . $mode . "_location to pass " . $target_lsn . " on " . $self->name . "\n"; + my $query = qq[SELECT '$target_lsn' <= ${mode}_location FROM pg_catalog.pg_stat_replication WHERE application_name = '$standby_name';]; + $self->poll_query_until('postgres', $query) + or die "timed out waiting for catchup, current position is " . ($self->safe_psql('postgres', $query) || '(unknown)'); + print "done\n"; +} + +=pod $node->wait_for_slot_catchup(slot_name, mode, target_lsn) + +Wait for the named replication slot to equal or pass the supplied target_lsn. +The position used is the restart_lsn unless mode is given, in which case it may +be 'restart' or 'confirmed_flu
Re: [HACKERS] [PATCH] PostgresNode.pm enhancements, pg_lsn helper, and some more recovery tests
On 3 January 2017 at 12:36, Craig Ringer wrote: > On 2 January 2017 at 20:17, Simon Riggs wrote: > >> Bit confused... can't see a caller for wait_for_slot_catchup() and the >> slot tests don't call it AFAICS. > > The recovery tests for decoding on standby will use it. I can delay > adding it until then. > >> Also can't see anywhere the LSN stuff is used either. > > Removed after discussion with Michael in the logical decoding on standby > thread. > > I'll be posting a new patch series there shortly, which removes > pg_lsn.pm and modifies the PostgresNode.pm changes per discussion. If > you're able to commit the updated PostgresNode.pm and new standby > tests that'd be very handy. To keep things together, I've followed up on the logical decoding on standby thread that now incorporates all these patches. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] PostgresNode.pm enhancements, pg_lsn helper, and some more recovery tests
On 2 January 2017 at 20:17, Simon Riggs wrote: > Bit confused... can't see a caller for wait_for_slot_catchup() and the > slot tests don't call it AFAICS. The recovery tests for decoding on standby will use it. I can delay adding it until then. > Also can't see anywhere the LSN stuff is used either. Removed after discussion with Michael in the logical decoding on standby thread. I'll be posting a new patch series there shortly, which removes pg_lsn.pm and modifies the PostgresNode.pm changes per discussion. If you're able to commit the updated PostgresNode.pm and new standby tests that'd be very handy. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] PostgresNode.pm enhancements, pg_lsn helper, and some more recovery tests
On 14 November 2016 at 07:01, Craig Ringer wrote: > Every test I write with the TAP framework for recovery seems to need > to wait for one node to catch up to another or examine replication > slot state. So: attached is a patch to add helper methods for these > tasks. > > Also add a new pg_lsn.pm helper class to deal with PostgreSQL's > awkward representation of LSNs in perl. Made extra fun by our use of > Perl 5.8.8 with no new modules, so we can't rely on having 64-bit > integers. Provides sensible LSN comparisons. I'll be using this in > coming patches that have to make assertions about differences between > LSNs, and I think it's sensible to have anyway. Incorporates tests for > the class. > > Finally, add some coverage of physical replication slots to recovery tests. > > Backpatch to 9.6 desirable, since we seem to be doing that for TAP > infrastructure. > > These three are related enough, and all only touch the TAP framework, > so I've bundled them in a series. Good to see more tests going in. Bit confused... can't see a caller for wait_for_slot_catchup() and the slot tests don't call it AFAICS. Also can't see anywhere the LSN stuff is used either. No specific problems with the code, just don't want to commit code that isn't used anywhere, yet. I want to commit the extra tests soon, as a stronger test for my recovery.conf changes. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [PATCH] PostgresNode.pm enhancements, pg_lsn helper, and some more recovery tests
Every test I write with the TAP framework for recovery seems to need to wait for one node to catch up to another or examine replication slot state. So: attached is a patch to add helper methods for these tasks. Also add a new pg_lsn.pm helper class to deal with PostgreSQL's awkward representation of LSNs in perl. Made extra fun by our use of Perl 5.8.8 with no new modules, so we can't rely on having 64-bit integers. Provides sensible LSN comparisons. I'll be using this in coming patches that have to make assertions about differences between LSNs, and I think it's sensible to have anyway. Incorporates tests for the class. Finally, add some coverage of physical replication slots to recovery tests. Backpatch to 9.6 desirable, since we seem to be doing that for TAP infrastructure. These three are related enough, and all only touch the TAP framework, so I've bundled them in a series. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services From f01790bb3f47f7fabf64328f8c9a01e8f3cf837c Mon Sep 17 00:00:00 2001 From: Craig Ringer Date: Wed, 9 Nov 2016 13:44:04 +0800 Subject: [PATCH 3/3] Expand streaming replication tests to cover hot standby feedback and physical replication slots --- src/test/recovery/t/001_stream_rep.pl | 105 +- 1 file changed, 104 insertions(+), 1 deletion(-) diff --git a/src/test/recovery/t/001_stream_rep.pl b/src/test/recovery/t/001_stream_rep.pl index 5ce69bb..ef29892 100644 --- a/src/test/recovery/t/001_stream_rep.pl +++ b/src/test/recovery/t/001_stream_rep.pl @@ -3,7 +3,7 @@ use strict; use warnings; use PostgresNode; use TestLib; -use Test::More tests => 4; +use Test::More tests => 22; # Initialize master node my $node_master = get_new_node('master'); @@ -58,3 +58,106 @@ is($node_standby_1->psql('postgres', 'INSERT INTO tab_int VALUES (1)'), 3, 'read-only queries on standby 1'); is($node_standby_2->psql('postgres', 'INSERT INTO tab_int VALUES (1)'), 3, 'read-only queries on standby 2'); + +diag "switching to physical replication slot"; +# Switch to using a physical replication slot. We can do this without a new +# backup since physical slots can go backwards if needed. Do so on both +# standbys. Since we're going to be testing things that affect the slot state, +# also increase the standby feedback interval to ensure timely updates. +my ($slotname_1, $slotname_2) = ('standby_1', 'standby_2'); +$node_master->append_conf('postgresql.conf', "max_replication_slots = 4\n"); +$node_master->restart; +is($node_master->psql('postgres', qq[SELECT pg_create_physical_replication_slot('$slotname_1');]), 0, 'physical slot created on master'); +$node_standby_1->append_conf('recovery.conf', "primary_slot_name = $slotname_1\n"); +$node_standby_1->append_conf('postgresql.conf', "wal_receiver_status_interval = 1\n"); +$node_standby_1->append_conf('postgresql.conf', "max_replication_slots = 4\n"); +$node_standby_1->restart; +is($node_standby_1->psql('postgres', qq[SELECT pg_create_physical_replication_slot('$slotname_2');]), 0, 'physical slot created on intermediate replica'); +$node_standby_2->append_conf('recovery.conf', "primary_slot_name = $slotname_2\n"); +$node_standby_2->append_conf('postgresql.conf', "wal_receiver_status_interval = 1\n"); +$node_standby_2->restart; + +sub get_slot_xmins +{ + my ($node, $slotname) = @_; + my $slotinfo = $node->slot($slotname); + return ($slotinfo->{'xmin'}, $slotinfo->{'catalog_xmin'}); +} + +# There's no hot standby feedback and there are no logical slots on either peer +# so xmin and catalog_xmin should be null on both slots. +my ($xmin, $catalog_xmin) = get_slot_xmins($node_master, $slotname_1); +is($xmin, '', 'non-cascaded slot xmin null with no hs_feedback'); +is($catalog_xmin, '', 'non-cascaded slot xmin null with no hs_feedback'); + +($xmin, $catalog_xmin) = get_slot_xmins($node_standby_1, $slotname_2); +is($xmin, '', 'cascaded slot xmin null with no hs_feedback'); +is($catalog_xmin, '', 'cascaded slot xmin null with no hs_feedback'); + +# Replication still works? +$node_master->safe_psql('postgres', 'CREATE TABLE replayed(val integer);'); + +sub replay_check +{ + my $newval = $node_master->safe_psql('postgres', 'INSERT INTO replayed(val) SELECT coalesce(max(val),0) + 1 AS newval FROM replayed RETURNING val'); + $node_master->wait_for_catchup($node_standby_1); + $node_standby_1->wait_for_catchup($node_standby_2); + $node_standby_1->safe_psql('postgres', qq[SELECT 1 FROM replayed WHERE val = $newval]) + or die "standby_1 didn't replay master value $newval"; + $node_standby_2->safe_psql('postgres', qq[SELECT 1 FROM replayed WHERE val = $newval]) + or die "standby_2 didn't replay standby_1 value $newval"; +} + +replay_check(); + +diag "enabling hot_standby_feedback"; +# Enable hs_feedback. The slot should gain an xmin. We set the status interval +# so we'll see the results promptly. +$node_standby_1->safe_psql('postgres', 'ALTER SYSTEM SET ho