pgsql: Remove incorrect assertion in reorderbuffer.c.

2020-12-04 Thread Amit Kapila
Remove incorrect assertion in reorderbuffer.c.

We start recording changes in ReorderBufferTXN even before we reach
SNAPBUILD_CONSISTENT state so that if the commit is encountered after
reaching that we should be able to send the changes of the entire transaction.
Now, while recording changes if the reorder buffer memory has exceeded
logical_decoding_work_mem then we can start streaming if it is allowed and
we haven't yet streamed that data. However, we must not allow streaming to
start unless the snapshot has reached SNAPBUILD_CONSISTENT state.

In passing, improve the comments atop ReorderBufferResetTXN to mention the
case when we need to continue streaming after getting an error.

Author: Amit Kapila
Reviewed-by: Dilip Kumar
Discussion: 
https://postgr.es/m/caa4ek1kooh0byboyyy40nbcc7fe812trwta+wy3jqf7wqwz...@mail.gmail.com

Branch
--
master

Details
---
https://git.postgresql.org/pg/commitdiff/8ae4ef4fb0e0331f02c4615182600546c8e5c4d4

Modified Files
--
src/backend/replication/logical/reorderbuffer.c | 10 ++
1 file changed, 6 insertions(+), 4 deletions(-)



pgsql: Remove unnecessary grammar symbols

2020-12-04 Thread Peter Eisentraut
Remove unnecessary grammar symbols

Instead of publication_name_list, we can use name_list.  We already
refer to publications everywhere else by the 'name' or 'name_list'
symbols, so this only improves consistency.

Reviewed-by: 
https://www.postgresql.org/message-id/flat/3e3ccddb-41bd-ecd8-29fe-195e34d9886f%40enterprisedb.com
Discussion: Tom Lane 

Branch
--
master

Details
---
https://git.postgresql.org/pg/commitdiff/a6964bc1bb0793e20636ccb573cd2a5ad3ef7667

Modified Files
--
src/backend/parser/gram.y | 20 ++--
1 file changed, 2 insertions(+), 18 deletions(-)



pgsql: Convert elog(LOG) calls to ereport() where appropriate

2020-12-04 Thread Peter Eisentraut
Convert elog(LOG) calls to ereport() where appropriate

User-visible log messages should go through ereport(), so they are
subject to translation.  Many remaining elog(LOG) calls are really
debugging calls.

Reviewed-by: Alvaro Herrera 
Reviewed-by: Michael Paquier 
Reviewed-by: Noah Misch 
Discussion: 
https://www.postgresql.org/message-id/flat/92d6f545-5102-65d8-3c87-489f71ea0a37%40enterprisedb.com

Branch
--
master

Details
---
https://git.postgresql.org/pg/commitdiff/eb93f3a0b633ad6afb0f37391b87f460c4b0663b

Modified Files
--
src/backend/access/gist/gist.c   |   5 +-
src/backend/access/nbtree/nbtpage.c  |   7 ++-
src/backend/access/transam/xlog.c| 100 +--
src/backend/libpq/auth.c |   7 ++-
src/backend/libpq/hba.c  |   3 +-
src/backend/libpq/pqcomm.c   |  56 +++--
src/backend/postmaster/bgworker.c|   8 +--
src/backend/postmaster/pgstat.c  |  17 +++---
src/backend/postmaster/postmaster.c  |  44 +-
src/backend/replication/logical/origin.c |   9 +--
src/backend/storage/file/fd.c|   8 ++-
src/backend/utils/misc/guc.c |   2 +-
12 files changed, 172 insertions(+), 94 deletions(-)



pgsql: Fix race conditions in newly-added test.

2020-12-04 Thread Heikki Linnakangas
Fix race conditions in newly-added test.

Buildfarm has been failing sporadically on the new test.  I was able to
reproduce this by adding a random 0-10 s delay in the walreceiver, just
before it connects to the primary. There's a race condition where node_3
is promoted before it has fully caught up with node_1, leading to diverged
timelines. When node_1 is later reconfigured as standby following node_3,
it fails to catch up:

LOG:  primary server contains no more WAL on requested timeline 1
LOG:  new timeline 2 forked off current database system timeline 1 before 
current recovery point 0/3A0

That's the situation where you'd need to use pg_rewind, but in this case
it happens already when we are just setting up the actual pg_rewind
scenario we want to test, so change the test so that it waits until
node_3 is connected and fully caught up before promoting it, so that you
get a clean, controlled failover.

Also rewrite some of the comments, for clarity. The existing comments
detailed what each step in the test did, but didn't give a good overview
of the situation the steps were trying to create.

For reasons I don't understand, the test setup had to be written slightly
differently in 9.6 and 9.5 than in later versions. The 9.5/9.6 version
needed node 1 to be reinitialized from backup, whereas in later versions
it could be shut down and reconfigured to be a standby. But even 9.5 should
support "clean switchover", where primary makes sure that pending WAL is
replicated to standby on shutdown. It would be nice to figure out what's
going on there, but that's independent of pg_rewind and the scenario that
this test tests.

Discussion: 
https://www.postgresql.org/message-id/b0a3b95b-82d2-6089-6892-40570f8c5e60%40iki.fi

Branch
--
REL_11_STABLE

Details
---
https://git.postgresql.org/pg/commitdiff/cda50f2112f29eb4bca148b1dd7f06efa559acdf

Modified Files
--
src/bin/pg_rewind/t/008_min_recovery_point.pl | 36 ---
1 file changed, 22 insertions(+), 14 deletions(-)



pgsql: Fix race conditions in newly-added test.

2020-12-04 Thread Heikki Linnakangas
Fix race conditions in newly-added test.

Buildfarm has been failing sporadically on the new test.  I was able to
reproduce this by adding a random 0-10 s delay in the walreceiver, just
before it connects to the primary. There's a race condition where node_3
is promoted before it has fully caught up with node_1, leading to diverged
timelines. When node_1 is later reconfigured as standby following node_3,
it fails to catch up:

LOG:  primary server contains no more WAL on requested timeline 1
LOG:  new timeline 2 forked off current database system timeline 1 before 
current recovery point 0/3A0

That's the situation where you'd need to use pg_rewind, but in this case
it happens already when we are just setting up the actual pg_rewind
scenario we want to test, so change the test so that it waits until
node_3 is connected and fully caught up before promoting it, so that you
get a clean, controlled failover.

Also rewrite some of the comments, for clarity. The existing comments
detailed what each step in the test did, but didn't give a good overview
of the situation the steps were trying to create.

For reasons I don't understand, the test setup had to be written slightly
differently in 9.6 and 9.5 than in later versions. The 9.5/9.6 version
needed node 1 to be reinitialized from backup, whereas in later versions
it could be shut down and reconfigured to be a standby. But even 9.5 should
support "clean switchover", where primary makes sure that pending WAL is
replicated to standby on shutdown. It would be nice to figure out what's
going on there, but that's independent of pg_rewind and the scenario that
this test tests.

Discussion: 
https://www.postgresql.org/message-id/b0a3b95b-82d2-6089-6892-40570f8c5e60%40iki.fi

Branch
--
REL_13_STABLE

Details
---
https://git.postgresql.org/pg/commitdiff/e41a2efbca10438fa0a506d4158edd1a1964aacf

Modified Files
--
src/bin/pg_rewind/t/008_min_recovery_point.pl | 33 ---
1 file changed, 20 insertions(+), 13 deletions(-)



pgsql: Fix race conditions in newly-added test.

2020-12-04 Thread Heikki Linnakangas
Fix race conditions in newly-added test.

Buildfarm has been failing sporadically on the new test.  I was able to
reproduce this by adding a random 0-10 s delay in the walreceiver, just
before it connects to the primary. There's a race condition where node_3
is promoted before it has fully caught up with node_1, leading to diverged
timelines. When node_1 is later reconfigured as standby following node_3,
it fails to catch up:

LOG:  primary server contains no more WAL on requested timeline 1
LOG:  new timeline 2 forked off current database system timeline 1 before 
current recovery point 0/3A0

That's the situation where you'd need to use pg_rewind, but in this case
it happens already when we are just setting up the actual pg_rewind
scenario we want to test, so change the test so that it waits until
node_3 is connected and fully caught up before promoting it, so that you
get a clean, controlled failover.

Also rewrite some of the comments, for clarity. The existing comments
detailed what each step in the test did, but didn't give a good overview
of the situation the steps were trying to create.

For reasons I don't understand, the test setup had to be written slightly
differently in 9.6 and 9.5 than in later versions. The 9.5/9.6 version
needed node 1 to be reinitialized from backup, whereas in later versions
it could be shut down and reconfigured to be a standby. But even 9.5 should
support "clean switchover", where primary makes sure that pending WAL is
replicated to standby on shutdown. It would be nice to figure out what's
going on there, but that's independent of pg_rewind and the scenario that
this test tests.

Discussion: 
https://www.postgresql.org/message-id/b0a3b95b-82d2-6089-6892-40570f8c5e60%40iki.fi

Branch
--
REL_10_STABLE

Details
---
https://git.postgresql.org/pg/commitdiff/45d3631450940c8fb69a5911bda497ebcb72c8cc

Modified Files
--
src/bin/pg_rewind/t/008_min_recovery_point.pl | 38 ---
1 file changed, 23 insertions(+), 15 deletions(-)



pgsql: Fix race conditions in newly-added test.

2020-12-04 Thread Heikki Linnakangas
Fix race conditions in newly-added test.

Buildfarm has been failing sporadically on the new test.  I was able to
reproduce this by adding a random 0-10 s delay in the walreceiver, just
before it connects to the primary. There's a race condition where node_3
is promoted before it has fully caught up with node_1, leading to diverged
timelines. When node_1 is later reconfigured as standby following node_3,
it fails to catch up:

LOG:  primary server contains no more WAL on requested timeline 1
LOG:  new timeline 2 forked off current database system timeline 1 before 
current recovery point 0/3A0

That's the situation where you'd need to use pg_rewind, but in this case
it happens already when we are just setting up the actual pg_rewind
scenario we want to test, so change the test so that it waits until
node_3 is connected and fully caught up before promoting it, so that you
get a clean, controlled failover.

Also rewrite some of the comments, for clarity. The existing comments
detailed what each step in the test did, but didn't give a good overview
of the situation the steps were trying to create.

For reasons I don't understand, the test setup had to be written slightly
differently in 9.6 and 9.5 than in later versions. The 9.5/9.6 version
needed node 1 to be reinitialized from backup, whereas in later versions
it could be shut down and reconfigured to be a standby. But even 9.5 should
support "clean switchover", where primary makes sure that pending WAL is
replicated to standby on shutdown. It would be nice to figure out what's
going on there, but that's independent of pg_rewind and the scenario that
this test tests.

Discussion: 
https://www.postgresql.org/message-id/b0a3b95b-82d2-6089-6892-40570f8c5e60%40iki.fi

Branch
--
REL_12_STABLE

Details
---
https://git.postgresql.org/pg/commitdiff/ad3fb04b9cc2d490e64d4a16e516b5f9eeadc7f3

Modified Files
--
src/bin/pg_rewind/t/008_min_recovery_point.pl | 33 ---
1 file changed, 20 insertions(+), 13 deletions(-)



pgsql: Fix race conditions in newly-added test.

2020-12-04 Thread Heikki Linnakangas
Fix race conditions in newly-added test.

Buildfarm has been failing sporadically on the new test.  I was able to
reproduce this by adding a random 0-10 s delay in the walreceiver, just
before it connects to the primary. There's a race condition where node_3
is promoted before it has fully caught up with node_1, leading to diverged
timelines. When node_1 is later reconfigured as standby following node_3,
it fails to catch up:

LOG:  primary server contains no more WAL on requested timeline 1
LOG:  new timeline 2 forked off current database system timeline 1 before 
current recovery point 0/3A0

That's the situation where you'd need to use pg_rewind, but in this case
it happens already when we are just setting up the actual pg_rewind
scenario we want to test, so change the test so that it waits until
node_3 is connected and fully caught up before promoting it, so that you
get a clean, controlled failover.

Also rewrite some of the comments, for clarity. The existing comments
detailed what each step in the test did, but didn't give a good overview
of the situation the steps were trying to create.

For reasons I don't understand, the test setup had to be written slightly
differently in 9.6 and 9.5 than in later versions. The 9.5/9.6 version
needed node 1 to be reinitialized from backup, whereas in later versions
it could be shut down and reconfigured to be a standby. But even 9.5 should
support "clean switchover", where primary makes sure that pending WAL is
replicated to standby on shutdown. It would be nice to figure out what's
going on there, but that's independent of pg_rewind and the scenario that
this test tests.

Discussion: 
https://www.postgresql.org/message-id/b0a3b95b-82d2-6089-6892-40570f8c5e60%40iki.fi

Branch
--
master

Details
---
https://git.postgresql.org/pg/commitdiff/36a4ac20fcf31361bd42b63b1b3390b28827a69e

Modified Files
--
src/bin/pg_rewind/t/008_min_recovery_point.pl | 33 ---
1 file changed, 20 insertions(+), 13 deletions(-)



pgsql: Fix race conditions in newly-added test.

2020-12-04 Thread Heikki Linnakangas
Fix race conditions in newly-added test.

Buildfarm has been failing sporadically on the new test.  I was able to
reproduce this by adding a random 0-10 s delay in the walreceiver, just
before it connects to the primary. There's a race condition where node_3
is promoted before it has fully caught up with node_1, leading to diverged
timelines. When node_1 is later reconfigured as standby following node_3,
it fails to catch up:

LOG:  primary server contains no more WAL on requested timeline 1
LOG:  new timeline 2 forked off current database system timeline 1 before 
current recovery point 0/3A0

That's the situation where you'd need to use pg_rewind, but in this case
it happens already when we are just setting up the actual pg_rewind
scenario we want to test, so change the test so that it waits until
node_3 is connected and fully caught up before promoting it, so that you
get a clean, controlled failover.

Also rewrite some of the comments, for clarity. The existing comments
detailed what each step in the test did, but didn't give a good overview
of the situation the steps were trying to create.

For reasons I don't understand, the test setup had to be written slightly
differently in 9.6 and 9.5 than in later versions. The 9.5/9.6 version
needed node 1 to be reinitialized from backup, whereas in later versions
it could be shut down and reconfigured to be a standby. But even 9.5 should
support "clean switchover", where primary makes sure that pending WAL is
replicated to standby on shutdown. It would be nice to figure out what's
going on there, but that's independent of pg_rewind and the scenario that
this test tests.

Discussion: 
https://www.postgresql.org/message-id/b0a3b95b-82d2-6089-6892-40570f8c5e60%40iki.fi

Branch
--
REL9_5_STABLE

Details
---
https://git.postgresql.org/pg/commitdiff/1dd608bbac28a5dfcaade0ffb56f0dc4f61f7320

Modified Files
--
src/bin/pg_rewind/t/008_min_recovery_point.pl | 49 +++
1 file changed, 34 insertions(+), 15 deletions(-)



pgsql: Fix race conditions in newly-added test.

2020-12-04 Thread Heikki Linnakangas
Fix race conditions in newly-added test.

Buildfarm has been failing sporadically on the new test.  I was able to
reproduce this by adding a random 0-10 s delay in the walreceiver, just
before it connects to the primary. There's a race condition where node_3
is promoted before it has fully caught up with node_1, leading to diverged
timelines. When node_1 is later reconfigured as standby following node_3,
it fails to catch up:

LOG:  primary server contains no more WAL on requested timeline 1
LOG:  new timeline 2 forked off current database system timeline 1 before 
current recovery point 0/3A0

That's the situation where you'd need to use pg_rewind, but in this case
it happens already when we are just setting up the actual pg_rewind
scenario we want to test, so change the test so that it waits until
node_3 is connected and fully caught up before promoting it, so that you
get a clean, controlled failover.

Also rewrite some of the comments, for clarity. The existing comments
detailed what each step in the test did, but didn't give a good overview
of the situation the steps were trying to create.

For reasons I don't understand, the test setup had to be written slightly
differently in 9.6 and 9.5 than in later versions. The 9.5/9.6 version
needed node 1 to be reinitialized from backup, whereas in later versions
it could be shut down and reconfigured to be a standby. But even 9.5 should
support "clean switchover", where primary makes sure that pending WAL is
replicated to standby on shutdown. It would be nice to figure out what's
going on there, but that's independent of pg_rewind and the scenario that
this test tests.

Discussion: 
https://www.postgresql.org/message-id/b0a3b95b-82d2-6089-6892-40570f8c5e60%40iki.fi

Branch
--
REL9_6_STABLE

Details
---
https://git.postgresql.org/pg/commitdiff/a075c84f2ce17cad75c3afe57de2ecb05a180288

Modified Files
--
src/bin/pg_rewind/t/008_min_recovery_point.pl | 49 +++
1 file changed, 34 insertions(+), 15 deletions(-)