pgsql: Remove incorrect assertion in reorderbuffer.c.
Remove incorrect assertion in reorderbuffer.c. We start recording changes in ReorderBufferTXN even before we reach SNAPBUILD_CONSISTENT state so that if the commit is encountered after reaching that we should be able to send the changes of the entire transaction. Now, while recording changes if the reorder buffer memory has exceeded logical_decoding_work_mem then we can start streaming if it is allowed and we haven't yet streamed that data. However, we must not allow streaming to start unless the snapshot has reached SNAPBUILD_CONSISTENT state. In passing, improve the comments atop ReorderBufferResetTXN to mention the case when we need to continue streaming after getting an error. Author: Amit Kapila Reviewed-by: Dilip Kumar Discussion: https://postgr.es/m/caa4ek1kooh0byboyyy40nbcc7fe812trwta+wy3jqf7wqwz...@mail.gmail.com Branch -- master Details --- https://git.postgresql.org/pg/commitdiff/8ae4ef4fb0e0331f02c4615182600546c8e5c4d4 Modified Files -- src/backend/replication/logical/reorderbuffer.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-)
pgsql: Remove unnecessary grammar symbols
Remove unnecessary grammar symbols Instead of publication_name_list, we can use name_list. We already refer to publications everywhere else by the 'name' or 'name_list' symbols, so this only improves consistency. Reviewed-by: https://www.postgresql.org/message-id/flat/3e3ccddb-41bd-ecd8-29fe-195e34d9886f%40enterprisedb.com Discussion: Tom Lane Branch -- master Details --- https://git.postgresql.org/pg/commitdiff/a6964bc1bb0793e20636ccb573cd2a5ad3ef7667 Modified Files -- src/backend/parser/gram.y | 20 ++-- 1 file changed, 2 insertions(+), 18 deletions(-)
pgsql: Convert elog(LOG) calls to ereport() where appropriate
Convert elog(LOG) calls to ereport() where appropriate User-visible log messages should go through ereport(), so they are subject to translation. Many remaining elog(LOG) calls are really debugging calls. Reviewed-by: Alvaro Herrera Reviewed-by: Michael Paquier Reviewed-by: Noah Misch Discussion: https://www.postgresql.org/message-id/flat/92d6f545-5102-65d8-3c87-489f71ea0a37%40enterprisedb.com Branch -- master Details --- https://git.postgresql.org/pg/commitdiff/eb93f3a0b633ad6afb0f37391b87f460c4b0663b Modified Files -- src/backend/access/gist/gist.c | 5 +- src/backend/access/nbtree/nbtpage.c | 7 ++- src/backend/access/transam/xlog.c| 100 +-- src/backend/libpq/auth.c | 7 ++- src/backend/libpq/hba.c | 3 +- src/backend/libpq/pqcomm.c | 56 +++-- src/backend/postmaster/bgworker.c| 8 +-- src/backend/postmaster/pgstat.c | 17 +++--- src/backend/postmaster/postmaster.c | 44 +- src/backend/replication/logical/origin.c | 9 +-- src/backend/storage/file/fd.c| 8 ++- src/backend/utils/misc/guc.c | 2 +- 12 files changed, 172 insertions(+), 94 deletions(-)
pgsql: Fix race conditions in newly-added test.
Fix race conditions in newly-added test. Buildfarm has been failing sporadically on the new test. I was able to reproduce this by adding a random 0-10 s delay in the walreceiver, just before it connects to the primary. There's a race condition where node_3 is promoted before it has fully caught up with node_1, leading to diverged timelines. When node_1 is later reconfigured as standby following node_3, it fails to catch up: LOG: primary server contains no more WAL on requested timeline 1 LOG: new timeline 2 forked off current database system timeline 1 before current recovery point 0/3A0 That's the situation where you'd need to use pg_rewind, but in this case it happens already when we are just setting up the actual pg_rewind scenario we want to test, so change the test so that it waits until node_3 is connected and fully caught up before promoting it, so that you get a clean, controlled failover. Also rewrite some of the comments, for clarity. The existing comments detailed what each step in the test did, but didn't give a good overview of the situation the steps were trying to create. For reasons I don't understand, the test setup had to be written slightly differently in 9.6 and 9.5 than in later versions. The 9.5/9.6 version needed node 1 to be reinitialized from backup, whereas in later versions it could be shut down and reconfigured to be a standby. But even 9.5 should support "clean switchover", where primary makes sure that pending WAL is replicated to standby on shutdown. It would be nice to figure out what's going on there, but that's independent of pg_rewind and the scenario that this test tests. Discussion: https://www.postgresql.org/message-id/b0a3b95b-82d2-6089-6892-40570f8c5e60%40iki.fi Branch -- REL_11_STABLE Details --- https://git.postgresql.org/pg/commitdiff/cda50f2112f29eb4bca148b1dd7f06efa559acdf Modified Files -- src/bin/pg_rewind/t/008_min_recovery_point.pl | 36 --- 1 file changed, 22 insertions(+), 14 deletions(-)
pgsql: Fix race conditions in newly-added test.
Fix race conditions in newly-added test. Buildfarm has been failing sporadically on the new test. I was able to reproduce this by adding a random 0-10 s delay in the walreceiver, just before it connects to the primary. There's a race condition where node_3 is promoted before it has fully caught up with node_1, leading to diverged timelines. When node_1 is later reconfigured as standby following node_3, it fails to catch up: LOG: primary server contains no more WAL on requested timeline 1 LOG: new timeline 2 forked off current database system timeline 1 before current recovery point 0/3A0 That's the situation where you'd need to use pg_rewind, but in this case it happens already when we are just setting up the actual pg_rewind scenario we want to test, so change the test so that it waits until node_3 is connected and fully caught up before promoting it, so that you get a clean, controlled failover. Also rewrite some of the comments, for clarity. The existing comments detailed what each step in the test did, but didn't give a good overview of the situation the steps were trying to create. For reasons I don't understand, the test setup had to be written slightly differently in 9.6 and 9.5 than in later versions. The 9.5/9.6 version needed node 1 to be reinitialized from backup, whereas in later versions it could be shut down and reconfigured to be a standby. But even 9.5 should support "clean switchover", where primary makes sure that pending WAL is replicated to standby on shutdown. It would be nice to figure out what's going on there, but that's independent of pg_rewind and the scenario that this test tests. Discussion: https://www.postgresql.org/message-id/b0a3b95b-82d2-6089-6892-40570f8c5e60%40iki.fi Branch -- REL_13_STABLE Details --- https://git.postgresql.org/pg/commitdiff/e41a2efbca10438fa0a506d4158edd1a1964aacf Modified Files -- src/bin/pg_rewind/t/008_min_recovery_point.pl | 33 --- 1 file changed, 20 insertions(+), 13 deletions(-)
pgsql: Fix race conditions in newly-added test.
Fix race conditions in newly-added test. Buildfarm has been failing sporadically on the new test. I was able to reproduce this by adding a random 0-10 s delay in the walreceiver, just before it connects to the primary. There's a race condition where node_3 is promoted before it has fully caught up with node_1, leading to diverged timelines. When node_1 is later reconfigured as standby following node_3, it fails to catch up: LOG: primary server contains no more WAL on requested timeline 1 LOG: new timeline 2 forked off current database system timeline 1 before current recovery point 0/3A0 That's the situation where you'd need to use pg_rewind, but in this case it happens already when we are just setting up the actual pg_rewind scenario we want to test, so change the test so that it waits until node_3 is connected and fully caught up before promoting it, so that you get a clean, controlled failover. Also rewrite some of the comments, for clarity. The existing comments detailed what each step in the test did, but didn't give a good overview of the situation the steps were trying to create. For reasons I don't understand, the test setup had to be written slightly differently in 9.6 and 9.5 than in later versions. The 9.5/9.6 version needed node 1 to be reinitialized from backup, whereas in later versions it could be shut down and reconfigured to be a standby. But even 9.5 should support "clean switchover", where primary makes sure that pending WAL is replicated to standby on shutdown. It would be nice to figure out what's going on there, but that's independent of pg_rewind and the scenario that this test tests. Discussion: https://www.postgresql.org/message-id/b0a3b95b-82d2-6089-6892-40570f8c5e60%40iki.fi Branch -- REL_10_STABLE Details --- https://git.postgresql.org/pg/commitdiff/45d3631450940c8fb69a5911bda497ebcb72c8cc Modified Files -- src/bin/pg_rewind/t/008_min_recovery_point.pl | 38 --- 1 file changed, 23 insertions(+), 15 deletions(-)
pgsql: Fix race conditions in newly-added test.
Fix race conditions in newly-added test. Buildfarm has been failing sporadically on the new test. I was able to reproduce this by adding a random 0-10 s delay in the walreceiver, just before it connects to the primary. There's a race condition where node_3 is promoted before it has fully caught up with node_1, leading to diverged timelines. When node_1 is later reconfigured as standby following node_3, it fails to catch up: LOG: primary server contains no more WAL on requested timeline 1 LOG: new timeline 2 forked off current database system timeline 1 before current recovery point 0/3A0 That's the situation where you'd need to use pg_rewind, but in this case it happens already when we are just setting up the actual pg_rewind scenario we want to test, so change the test so that it waits until node_3 is connected and fully caught up before promoting it, so that you get a clean, controlled failover. Also rewrite some of the comments, for clarity. The existing comments detailed what each step in the test did, but didn't give a good overview of the situation the steps were trying to create. For reasons I don't understand, the test setup had to be written slightly differently in 9.6 and 9.5 than in later versions. The 9.5/9.6 version needed node 1 to be reinitialized from backup, whereas in later versions it could be shut down and reconfigured to be a standby. But even 9.5 should support "clean switchover", where primary makes sure that pending WAL is replicated to standby on shutdown. It would be nice to figure out what's going on there, but that's independent of pg_rewind and the scenario that this test tests. Discussion: https://www.postgresql.org/message-id/b0a3b95b-82d2-6089-6892-40570f8c5e60%40iki.fi Branch -- REL_12_STABLE Details --- https://git.postgresql.org/pg/commitdiff/ad3fb04b9cc2d490e64d4a16e516b5f9eeadc7f3 Modified Files -- src/bin/pg_rewind/t/008_min_recovery_point.pl | 33 --- 1 file changed, 20 insertions(+), 13 deletions(-)
pgsql: Fix race conditions in newly-added test.
Fix race conditions in newly-added test. Buildfarm has been failing sporadically on the new test. I was able to reproduce this by adding a random 0-10 s delay in the walreceiver, just before it connects to the primary. There's a race condition where node_3 is promoted before it has fully caught up with node_1, leading to diverged timelines. When node_1 is later reconfigured as standby following node_3, it fails to catch up: LOG: primary server contains no more WAL on requested timeline 1 LOG: new timeline 2 forked off current database system timeline 1 before current recovery point 0/3A0 That's the situation where you'd need to use pg_rewind, but in this case it happens already when we are just setting up the actual pg_rewind scenario we want to test, so change the test so that it waits until node_3 is connected and fully caught up before promoting it, so that you get a clean, controlled failover. Also rewrite some of the comments, for clarity. The existing comments detailed what each step in the test did, but didn't give a good overview of the situation the steps were trying to create. For reasons I don't understand, the test setup had to be written slightly differently in 9.6 and 9.5 than in later versions. The 9.5/9.6 version needed node 1 to be reinitialized from backup, whereas in later versions it could be shut down and reconfigured to be a standby. But even 9.5 should support "clean switchover", where primary makes sure that pending WAL is replicated to standby on shutdown. It would be nice to figure out what's going on there, but that's independent of pg_rewind and the scenario that this test tests. Discussion: https://www.postgresql.org/message-id/b0a3b95b-82d2-6089-6892-40570f8c5e60%40iki.fi Branch -- master Details --- https://git.postgresql.org/pg/commitdiff/36a4ac20fcf31361bd42b63b1b3390b28827a69e Modified Files -- src/bin/pg_rewind/t/008_min_recovery_point.pl | 33 --- 1 file changed, 20 insertions(+), 13 deletions(-)
pgsql: Fix race conditions in newly-added test.
Fix race conditions in newly-added test. Buildfarm has been failing sporadically on the new test. I was able to reproduce this by adding a random 0-10 s delay in the walreceiver, just before it connects to the primary. There's a race condition where node_3 is promoted before it has fully caught up with node_1, leading to diverged timelines. When node_1 is later reconfigured as standby following node_3, it fails to catch up: LOG: primary server contains no more WAL on requested timeline 1 LOG: new timeline 2 forked off current database system timeline 1 before current recovery point 0/3A0 That's the situation where you'd need to use pg_rewind, but in this case it happens already when we are just setting up the actual pg_rewind scenario we want to test, so change the test so that it waits until node_3 is connected and fully caught up before promoting it, so that you get a clean, controlled failover. Also rewrite some of the comments, for clarity. The existing comments detailed what each step in the test did, but didn't give a good overview of the situation the steps were trying to create. For reasons I don't understand, the test setup had to be written slightly differently in 9.6 and 9.5 than in later versions. The 9.5/9.6 version needed node 1 to be reinitialized from backup, whereas in later versions it could be shut down and reconfigured to be a standby. But even 9.5 should support "clean switchover", where primary makes sure that pending WAL is replicated to standby on shutdown. It would be nice to figure out what's going on there, but that's independent of pg_rewind and the scenario that this test tests. Discussion: https://www.postgresql.org/message-id/b0a3b95b-82d2-6089-6892-40570f8c5e60%40iki.fi Branch -- REL9_5_STABLE Details --- https://git.postgresql.org/pg/commitdiff/1dd608bbac28a5dfcaade0ffb56f0dc4f61f7320 Modified Files -- src/bin/pg_rewind/t/008_min_recovery_point.pl | 49 +++ 1 file changed, 34 insertions(+), 15 deletions(-)
pgsql: Fix race conditions in newly-added test.
Fix race conditions in newly-added test. Buildfarm has been failing sporadically on the new test. I was able to reproduce this by adding a random 0-10 s delay in the walreceiver, just before it connects to the primary. There's a race condition where node_3 is promoted before it has fully caught up with node_1, leading to diverged timelines. When node_1 is later reconfigured as standby following node_3, it fails to catch up: LOG: primary server contains no more WAL on requested timeline 1 LOG: new timeline 2 forked off current database system timeline 1 before current recovery point 0/3A0 That's the situation where you'd need to use pg_rewind, but in this case it happens already when we are just setting up the actual pg_rewind scenario we want to test, so change the test so that it waits until node_3 is connected and fully caught up before promoting it, so that you get a clean, controlled failover. Also rewrite some of the comments, for clarity. The existing comments detailed what each step in the test did, but didn't give a good overview of the situation the steps were trying to create. For reasons I don't understand, the test setup had to be written slightly differently in 9.6 and 9.5 than in later versions. The 9.5/9.6 version needed node 1 to be reinitialized from backup, whereas in later versions it could be shut down and reconfigured to be a standby. But even 9.5 should support "clean switchover", where primary makes sure that pending WAL is replicated to standby on shutdown. It would be nice to figure out what's going on there, but that's independent of pg_rewind and the scenario that this test tests. Discussion: https://www.postgresql.org/message-id/b0a3b95b-82d2-6089-6892-40570f8c5e60%40iki.fi Branch -- REL9_6_STABLE Details --- https://git.postgresql.org/pg/commitdiff/a075c84f2ce17cad75c3afe57de2ecb05a180288 Modified Files -- src/bin/pg_rewind/t/008_min_recovery_point.pl | 49 +++ 1 file changed, 34 insertions(+), 15 deletions(-)
