Re: logical replication empty transactions
On Wed, Mar 30, 2022 at 7:15 AM Masahiko Sawada wrote: > > On Tue, Mar 29, 2022 at 6:15 PM houzj.f...@fujitsu.com > wrote: > > > > Thanks for the comment. > > Attach the new version patch with this change. > > > > Thank you for updating the patch. Looks good to me. > Pushed. -- With Regards, Amit Kapila.
Re: logical replication empty transactions
On Tue, Mar 29, 2022 at 6:15 PM houzj.f...@fujitsu.com wrote: > > On Tuesday, March 29, 2022 5:12 PM Amit Kapila > wrote: > > > > On Tue, Mar 29, 2022 at 2:05 PM houzj.f...@fujitsu.com > > wrote: > > > > > > Attach the new version patch which addressed the above comments and > > > slightly adjusted some code comments. > > > > > > > The patch looks good to me. One minor suggestion is to change the function > > name ProcessPendingWritesAndTimeOut() to ProcessPendingWrites(). > > Thanks for the comment. > Attach the new version patch with this change. > Thank you for updating the patch. Looks good to me. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
RE: logical replication empty transactions
On Tue, Mar 29, 2022 5:15 PM Hou, Zhijie/侯 志杰 wrote: > > Thanks for the comment. > Attach the new version patch with this change. > Hi, I did a performance test for this patch to see if it affects performance when publishing empty transactions, based on the v32 patch. In this test, I use synchronous logical replication, and publish a table with no operations on it. The test uses pgbench, each run takes 15 minutes, and I take median of 3 runs. Drop and recreate db after each run. The results are as follows, and attach the bar chart. The details of the test is also attached. TPS - publishing empty transactions (scale factor 1) 4 threads 8 threads 16 threads HEAD4818.2837 4353.6243 3888.5995 patched 5111.2936 4555.1629 4024.4286 TPS - publishing empty transactions (scale factor 100) 4 threads 8 threads 16 threads HEAD9066.6465 16118.0453 21485.1207 patched 9357.3361 16638.6409 24503.6829 There is an improvement of more than 3% after applying this patch, and in the best case, it improves by 14%, which looks good to me. Regards, Shi yu Some parameters specified in postgresql.conf = shared_buffers = 8GB checkpoint_timeout = 30min max_wal_size = 20GB min_wal_size = 10GB autovacuum = off Steps = -- in publisher -- using scale factor 100 pgbench -i postgres -p 5432 -s 100 -- using default scale factor, 1 pgbench -i postgres -p 5432 -- create table in publisher and subscriber CREATE TABLE tbl_empty (a int); -- create publication create publication pub for table tbl_empty; -- create subscription CREATE SUBSCRIPTION sub CONNECTION 'host=localhost port=5432 dbname=postgres' PUBLICATION pub;" Then set synchronous replication and restart publisher server. Run pgbench and record the TPS. pgbench -b tpcb-like -c 4 -j 4 -T 900 -n postgres About the test machine = The processor spec of the test machine is Intel® Xeon® Silver 4210 CPU @2.20GHz with 10 cores/20 threads/2 sockets.
RE: logical replication empty transactions
On Tuesday, March 29, 2022 5:12 PM Amit Kapila wrote: > > On Tue, Mar 29, 2022 at 2:05 PM houzj.f...@fujitsu.com > wrote: > > > > Attach the new version patch which addressed the above comments and > > slightly adjusted some code comments. > > > > The patch looks good to me. One minor suggestion is to change the function > name ProcessPendingWritesAndTimeOut() to ProcessPendingWrites(). Thanks for the comment. Attach the new version patch with this change. Best regards, Hou zj v34-0001-Skip-empty-transactions-for-logical-replication.patch Description: v34-0001-Skip-empty-transactions-for-logical-replication.patch
Re: logical replication empty transactions
On Tue, Mar 29, 2022 at 2:05 PM houzj.f...@fujitsu.com wrote: > > Attach the new version patch which addressed the > above comments and slightly adjusted some code comments. > The patch looks good to me. One minor suggestion is to change the function name ProcessPendingWritesAndTimeOut() to ProcessPendingWrites(). -- With Regards, Amit Kapila.
RE: logical replication empty transactions
On Tuesday, March 29, 2022 3:20 PM Masahiko Sawada wrote: > > Some comments: Thanks for the comments! > > + if (skipped_xact && > + SyncRepRequested() && > + ((volatile WalSndCtlData *) > WalSndCtl)->sync_standbys_defined) > + { > + WalSndKeepalive(false, ctx->write_location); > > I think we can use 'lsn' since it is actually ctx->write_location. Agreed, and changed. > --- > + if (!sent_begin_txn) > + { > + elog(DEBUG1, "Skipped replication of an empty > transaction with XID: %u", txn->xid); > + return; > + } > > The log message should start with lowercase. Changed. > --- > +# Note that the current location of the log file is not grabbed > +immediately # after reloading the configuration, but after sending one > +SQL command to # the node so as we are sure that the reloading has taken > effect. > +$log_location = -s $node_subscriber->logfile; > + > +$node_publisher->safe_psql('postgres', "INSERT INTO tab_notrep VALUES > +(11)"); > + > +$node_publisher->wait_for_catchup('tap_sub'); > + > +$logfile = slurp_file($node_publisher->logfile, $log_location); > > I think we should get the log location of the publisher node, not subscriber > node. Changed. Attach the new version patch which addressed the above comments and slightly adjusted some code comments. Best regards, Hou zj v33-0001-Skip-empty-transactions-for-logical-replication.patch Description: v33-0001-Skip-empty-transactions-for-logical-replication.patch
Re: logical replication empty transactions
On Mon, Mar 28, 2022 at 9:22 PM houzj.f...@fujitsu.com wrote: > > On Monday, March 28, 2022 3:08 PM Amit Kapila wrote: > > > > On Fri, Mar 25, 2022 at 12:50 PM houzj.f...@fujitsu.com > > wrote: > > > > > > Attach the new version patch with this change. > > > > > > > Few comments: > > Thanks for the comments. > > > = > > 1. I think we can move the keep_alive check after the tracklag record > > check to keep it consistent with another patch [1]. > > Changed. > > > 2. Add the comment about the new parameter skipped_xact atop > > WalSndUpdateProgress. > > Added. > > > 3. I think we need to call pq_flush_if_writable after sending a > > keepalive message to avoid delaying sync transactions. > > Agreed. > If we don’t flush the data, we might flush the keepalive later than before. > And > we could get the reply later as well and then the release of syncwait could be > delayed. > > Attach the new version patch which addressed the above comments. > The patch also adds a loop after the newly added keepalive message > to make sure the message is actually flushed to the client like what > did in WalSndWriteData. > Thank you for updating the patch! Some comments: + if (skipped_xact && + SyncRepRequested() && + ((volatile WalSndCtlData *) WalSndCtl)->sync_standbys_defined) + { + WalSndKeepalive(false, ctx->write_location); I think we can use 'lsn' since it is actually ctx->write_location. --- + if (!sent_begin_txn) + { + elog(DEBUG1, "Skipped replication of an empty transaction with XID: %u", txn->xid); + return; + } The log message should start with lowercase. --- +# Note that the current location of the log file is not grabbed immediately +# after reloading the configuration, but after sending one SQL command to +# the node so as we are sure that the reloading has taken effect. +$log_location = -s $node_subscriber->logfile; + +$node_publisher->safe_psql('postgres', "INSERT INTO tab_notrep VALUES (11)"); + +$node_publisher->wait_for_catchup('tap_sub'); + +$logfile = slurp_file($node_publisher->logfile, $log_location); I think we should get the log location of the publisher node, not subscriber node. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
RE: logical replication empty transactions
On Monday, March 28, 2022 3:08 PM Amit Kapila wrote: > > On Fri, Mar 25, 2022 at 12:50 PM houzj.f...@fujitsu.com > wrote: > > > > Attach the new version patch with this change. > > > > Few comments: Thanks for the comments. > = > 1. I think we can move the keep_alive check after the tracklag record > check to keep it consistent with another patch [1]. Changed. > 2. Add the comment about the new parameter skipped_xact atop > WalSndUpdateProgress. Added. > 3. I think we need to call pq_flush_if_writable after sending a > keepalive message to avoid delaying sync transactions. Agreed. If we don’t flush the data, we might flush the keepalive later than before. And we could get the reply later as well and then the release of syncwait could be delayed. Attach the new version patch which addressed the above comments. The patch also adds a loop after the newly added keepalive message to make sure the message is actually flushed to the client like what did in WalSndWriteData. Best regards, Hou zj v32-0001-Skip-empty-transactions-for-logical-replication.patch Description: v32-0001-Skip-empty-transactions-for-logical-replication.patch
Re: logical replication empty transactions
On Fri, Mar 25, 2022 at 12:50 PM houzj.f...@fujitsu.com wrote: > > Attach the new version patch with this change. > Few comments: = 1. I think we can move the keep_alive check after the tracklag record check to keep it consistent with another patch [1]. 2. Add the comment about the new parameter skipped_xact atop WalSndUpdateProgress. 3. I think we need to call pq_flush_if_writable after sending a keepalive message to avoid delaying sync transactions. [1]: https://www.postgresql.org/message-id/OS3PR01MB6275C64F264662E84D2FB7AE9E1D9%40OS3PR01MB6275.jpnprd01.prod.outlook.com -- With Regards, Amit Kapila.
RE: logical replication empty transactions
On Friday, March 25, 2022 8:31 AM houzj.f...@fujitsu.com wrote: > On Thursday, March 24, 2022 11:19 AM houzj.f...@fujitsu.com wrote: > > On Tuesday, March 22, 2022 7:50 PM Amit Kapila > > > wrote: > > > On Tue, Mar 22, 2022 at 7:25 AM houzj.f...@fujitsu.com > > > wrote: > > > > > > > > > On Monday, March 21, 2022 6:01 PM Amit Kapila > > > > > > > > > > wrote: > > > > > > > > Oh, sorry, I posted the wrong patch, here is the correct one. > > > > > > > > > > The test change looks good to me. I think additionally we can verify > > > that the record is not reflected in the subscriber table. Apart from > > > that, I had made minor changes mostly in the comments in the attached > > > patch. If those look okay to you, please include those in the next > > > version. > > > > Thanks, the changes look good to me, I merged the diff patch. > > > > Attach the new version patch which include the following changes: > > > > - Fix a typo > > - Change the requestreply flag of the newly added WalSndKeepalive to false, > > because the subscriber can judge whether it's necessary to post a reply > > based > > on the received LSN. > > - Add a testcase to make sure there is no data in subscriber side when the > > transaction is skipped. > > - Change the name of flag skipped_empty_xact to skipped_xact which seems > > more > > understandable. > > - Merge Amit's suggested changes. > > > > I did some more review for the newly added keepalive message and confirmed > that it's necessary to send this in sync mode. Since commit 75b1521 added decoding of sequence to logical replication, this patch needs to have send begin message in pgoutput_sequence if necessary. Attach the new version patch with this change. Best regards, Hou zj v30-0001-Skip-empty-transactions-for-logical-replication.patch Description: v30-0001-Skip-empty-transactions-for-logical-replication.patch
RE: logical replication empty transactions
On Thursday, March 24, 2022 11:19 AM houzj.f...@fujitsu.com wrote: > On Tuesday, March 22, 2022 7:50 PM Amit Kapila > wrote: > > On Tue, Mar 22, 2022 at 7:25 AM houzj.f...@fujitsu.com > > wrote: > > > > > > > On Monday, March 21, 2022 6:01 PM Amit Kapila > > > > > > > > wrote: > > > > > > Oh, sorry, I posted the wrong patch, here is the correct one. > > > > > > > The test change looks good to me. I think additionally we can verify > > that the record is not reflected in the subscriber table. Apart from > > that, I had made minor changes mostly in the comments in the attached > > patch. If those look okay to you, please include those in the next version. > > Thanks, the changes look good to me, I merged the diff patch. > > Attach the new version patch which include the following changes: > > - Fix a typo > - Change the requestreply flag of the newly added WalSndKeepalive to false, > because the subscriber can judge whether it's necessary to post a reply > based > on the received LSN. > - Add a testcase to make sure there is no data in subscriber side when the > transaction is skipped. > - Change the name of flag skipped_empty_xact to skipped_xact which seems > more > understandable. > - Merge Amit's suggested changes. > I did some more review for the newly added keepalive message and confirmed that it's necessary to send this in sync mode. + if (skipped_xact && + SyncRepRequested() && + ((volatile WalSndCtlData *) WalSndCtl)->sync_standbys_defined) + WalSndKeepalive(false, ctx->write_location); Because in sync replication, the publisher need to get the reply from subscirber to release the waiter. After applying the patch, we don't send empty transaction to subscriber, so we won't get a reply without this keepalive message. Although the walsender usually invoke WalSndWaitForWal() which will also send a keepalive message to subscriber, and we could get a reply and release the wait. But WalSndWaitForWal() is not always invoked for each record. When reading the page, we won't invoke WalSndWaitForWal() if we already have the record in our buffer[1]. [1] ReadPageInternal( ... /* check whether we have all the requested data already */ if (targetSegNo == state->seg.ws_segno && targetPageOff == state->segoff && reqLen <= state->readLen) return state->readLen; ... Based on above, if we don't have the newly added keepalive message in the patch, the transaction could wait for a bit more time to finish. For example, I did some experiments to confirm: 1. Set LOG_SNAPSHOT_INTERVAL_MS and checkpoint_timeout to a bigger value to make sure it doesn't generate extra WAL which could affect the test. 2. Use debugger to attach the walsender and let it stop in the WalSndWaitForWal() 3. Start two clients and modify un-published table postgres1 # INSERT INTO not_rep VALUES(1); waiting postgres2 # INSERT INTO not_rep VALUES(1); waiting 4. Release the walsender, and we can see it won't send a keepalive to subscriber until it has handled all the above two transactions, which means the two transaction will wait until all of them has been decoded. This behavior doesn't looks good and is inconsistent with the current behavior(the transaction will finish after decoding it or after sending it to sub if necessary). So, I think the newly add keepalive message makes sense. Best regards, Hou zj
RE: logical replication empty transactions
On Thursday, March 24, 2022 11:19 AM Hou, Zhijie/侯 志杰 wrote: > > Attach the new version patch which include the following changes: > > - Fix a typo > - Change the requestreply flag of the newly added WalSndKeepalive to false, > because the subscriber can judge whether it's necessary to post a reply > based > on the received LSN. > - Add a testcase to make sure there is no data in subscriber side when the > transaction is skipped. > - Change the name of flag skipped_empty_xact to skipped_xact which seems > more > understandable. > - Merge Amit's suggested changes. > Hi, This patch skips sending BEGIN/COMMIT messages for empty transactions and saves network bandwidth. So I tried to do a test to see how does it affect bandwidth. This test refers to the previous test by Peter[1]. I temporarily modified the code in worker.c to log the length of the data received by the subscriber (after calling walrcv_receive()). At the conclusion of the test run, the logs are processed to extract the numbers. [1] https://www.postgresql.org/message-id/CAHut%2BPuyqcDJO0X2BxY%2B9ycF%2Bew3x77FiCbTJQGnLDbNmMASZQ%40mail.gmail.com The number of transactions is fixed (1000), and I tested different mixes of empty and not-empty transactions sent - 0%, 25%, 50%, 100%. The patch will send keepalive message when skipping empty transaction in synchronous replication mode, so I tested both synchronous replication and asynchronous replication. The results are as follows, and attach the bar chart. Sync replication - size of sending data 0% 25% 50% 75% 100% HEAD335211 281655 223661 170271 115108 patched 335217 256617 173878 98095 18108 Async replication - size of sending data 0% 25% 50% 75% 100% HEAD339379 285835 236343 184227 115000 patched 335077 260953 180022 11 18126 The details of the test is also attached. Summary of result: In both synchronous replication mode and asynchronous replication mode, as more empty transactions, the improvement is more obvious. Even if when there is no empty transaction, I can't see any overhead. Regards, Shi yu -- create table CREATE TABLE test_tab (a int primary key, b text, c timestamptz DEFAULT now(), d bigint DEFAULT 999); CREATE TABLE test_tab_nopub (a int primary key, b text, c timestamptz DEFAULT now(), d bigint DEFAULT 999); test_empty_not_published.sql: BEGIN; INSERT INTO test_tab_nopub VALUES(1, 'foo'); UPDATE test_tab_nopub SET b = 'bar' WHERE a = 1; DELETE FROM test_tab_nopub WHERE a = 1; COMMIT; test_empty_published.sql: BEGIN; INSERT INTO test_tab VALUES(1, 'foo'); UPDATE test_tab SET b = 'bar' WHERE a = 1; DELETE FROM test_tab WHERE a = 1; COMMIT; -- create publication create publication pub for table test_tab; -- create subscription CREATE SUBSCRIPTION sub CONNECTION 'host=localhost port=5432 dbname=postgres' PUBLICATION pub;" -- empty transaction: 0% pgbench -n -p 5432 -s 100 -t 1000 -c 1 -f test_empty_published.sql postgres -- empty transaction: 25% pgbench -n -p 5432 -s 100 -t 1000 -c 1 -f test_empty_not_published.sql@5 -f test_empty_published.sql@15 postgres -- empty transaction: 50% pgbench -n -p 5432 -s 100 -t 1000 -c 1 -f test_empty_not_published.sql@10 -f test_empty_published.sql@10 postgres -- empty transaction: 75% pgbench -n -p 5432 -s 100 -t 1000 -c 1 -f test_empty_not_published.sql@15 -f test_empty_published.sql@5 postgres -- empty transaction: 100% pgbench -n -p 5432 -s 100 -t 1000 -c 1 -f test_empty_not_published.sql postgres
RE: logical replication empty transactions
On Tuesday, March 22, 2022 7:50 PM Amit Kapila wrote: > On Tue, Mar 22, 2022 at 7:25 AM houzj.f...@fujitsu.com > wrote: > > > > > On Monday, March 21, 2022 6:01 PM Amit Kapila > > > > > > wrote: > > > > Oh, sorry, I posted the wrong patch, here is the correct one. > > > > The test change looks good to me. I think additionally we can verify that the > record is not reflected in the subscriber table. Apart from that, I had made > minor changes mostly in the comments in the attached patch. If those look > okay to you, please include those in the next version. Thanks, the changes look good to me, I merged the diff patch. Attach the new version patch which include the following changes: - Fix a typo - Change the requestreply flag of the newly added WalSndKeepalive to false, because the subscriber can judge whether it's necessary to post a reply based on the received LSN. - Add a testcase to make sure there is no data in subscriber side when the transaction is skipped. - Change the name of flag skipped_empty_xact to skipped_xact which seems more understandable. - Merge Amit's suggested changes. Best regards, Hou zj v29-0001-Skip-empty-transactions-for-logical-replication.patch Description: v29-0001-Skip-empty-transactions-for-logical-replication.patch
Re: logical replication empty transactions
On Tue, Mar 22, 2022 at 7:25 AM houzj.f...@fujitsu.com wrote: > > > On Monday, March 21, 2022 6:01 PM Amit Kapila > > wrote: > > Oh, sorry, I posted the wrong patch, here is the correct one. > The test change looks good to me. I think additionally we can verify that the record is not reflected in the subscriber table. Apart from that, I had made minor changes mostly in the comments in the attached patch. If those look okay to you, please include those in the next version. -- With Regards, Amit Kapila. v28-0001-Skip-empty-transactions-for-logical-replication.patch Description: Binary data v28_diff_amit.1.patch Description: Binary data
RE: logical replication empty transactions
> On Monday, March 21, 2022 6:01 PM Amit Kapila > wrote: > > > > On Sat, Mar 19, 2022 at 9:10 AM Ajin Cherian wrote: > > > > > > On Thu, Mar 17, 2022 at 10:43 PM Amit Kapila > > > > > wrote: > > > > > > > 3. Can we add a simple test for it in one of the existing test > > > > files(say in 001_rep_changes.pl)? > > > > > > added a simple test. > > > > > > > This doesn't verify if the transaction is skipped. I think we should > > extend this test to check for a DEBUG message in the Logs (you need to > > probably set log_min_messages to DEBUG1 for this test). As an example, > > you can check the patch [1]. Also, it seems by mistake you have added > > wait_for_catchup() twice. > > I added a testcase to check the DEBUG message. > > > Few other comments: > > = > > 1. Let's keep the parameter name as skipped_empty_xact in > > OutputPluginUpdateProgress so as to not confuse with the other patch's > > [2] keep_alive parameter. I think in this case we must send the > > keep_alive message so as to not make the syncrep wait whereas in the > > other patch we only need to send it periodically based on > > wal_sender_timeout parameter. > > 2. The new function SyncRepEnabled() seems confusing to me as the > > comments in SyncRepWaitForLSN() clearly state why we need to first > > read the parameter 'sync_standbys_defined' without any lock then read > > it again with a lock if the parameter is true. So, I just put that > > check back and also added a similar check in WalSndUpdateProgress. > > 3. > > @@ -1392,11 +1481,21 @@ pgoutput_truncate(LogicalDecodingContext *ctx, > > ReorderBufferTXN *txn, > > continue; > > > > relids[nrelids++] = relid; > > + > > + /* Send BEGIN if we haven't yet */ > > + if (txndata && !txndata->sent_begin_txn) pgoutput_send_begin(ctx, > > + txn); > > maybe_send_schema(ctx, change, relation, relentry); > > } > > > > if (nrelids > 0) > > { > > + txndata = (PGOutputTxnData *) txn->output_plugin_private; > > + > > + /* Send BEGIN if we haven't yet */ > > + if (txndata && !txndata->sent_begin_txn) pgoutput_send_begin(ctx, > > + txn); > > + > > > > Why do we need to try sending the begin in the second check? I think > > it should be sufficient to do it in the above loop. > > > > I have made these and a number of other changes in the attached patch. > > Do let me know what you think of the attached? > > The changes look good to me. > And I did some basic tests for the patch and didn’t find some other problems. > > Attach the new version patch. Oh, sorry, I posted the wrong patch, here is the correct one. Best regards, Hou zj v28-0001-Skip-empty-transactions-for-logical-replication.patch Description: v28-0001-Skip-empty-transactions-for-logical-replication.patch
RE: logical replication empty transactions
On Monday, March 21, 2022 6:01 PM Amit Kapila wrote: > > On Sat, Mar 19, 2022 at 9:10 AM Ajin Cherian wrote: > > > > On Thu, Mar 17, 2022 at 10:43 PM Amit Kapila > wrote: > > > > > 3. Can we add a simple test for it in one of the existing test > > > files(say in 001_rep_changes.pl)? > > > > added a simple test. > > > > This doesn't verify if the transaction is skipped. I think we should > extend this test to check for a DEBUG message in the Logs (you need to > probably set log_min_messages to DEBUG1 for this test). As an example, > you can check the patch [1]. Also, it seems by mistake you have added > wait_for_catchup() twice. I added a testcase to check the DEBUG message. > Few other comments: > = > 1. Let's keep the parameter name as skipped_empty_xact in > OutputPluginUpdateProgress so as to not confuse with the other patch's > [2] keep_alive parameter. I think in this case we must send the > keep_alive message so as to not make the syncrep wait whereas in the > other patch we only need to send it periodically based on > wal_sender_timeout parameter. > 2. The new function SyncRepEnabled() seems confusing to me as the > comments in SyncRepWaitForLSN() clearly state why we need to first > read the parameter 'sync_standbys_defined' without any lock then read > it again with a lock if the parameter is true. So, I just put that > check back and also added a similar check in WalSndUpdateProgress. > 3. > @@ -1392,11 +1481,21 @@ pgoutput_truncate(LogicalDecodingContext *ctx, > ReorderBufferTXN *txn, > continue; > > relids[nrelids++] = relid; > + > + /* Send BEGIN if we haven't yet */ > + if (txndata && !txndata->sent_begin_txn) > + pgoutput_send_begin(ctx, txn); > maybe_send_schema(ctx, change, relation, relentry); > } > > if (nrelids > 0) > { > + txndata = (PGOutputTxnData *) txn->output_plugin_private; > + > + /* Send BEGIN if we haven't yet */ > + if (txndata && !txndata->sent_begin_txn) > + pgoutput_send_begin(ctx, txn); > + > > Why do we need to try sending the begin in the second check? I think > it should be sufficient to do it in the above loop. > > I have made these and a number of other changes in the attached patch. > Do let me know what you think of the attached? The changes look good to me. And I did some basic tests for the patch and didn’t find some other problems. Attach the new version patch. Best regards, Hou zj v28-0001-Skip-empty-transactions-for-logical-replication.patch Description: v28-0001-Skip-empty-transactions-for-logical-replication.patch
Re: logical replication empty transactions
On Sat, Mar 19, 2022 at 9:10 AM Ajin Cherian wrote: > > On Thu, Mar 17, 2022 at 10:43 PM Amit Kapila wrote: > > > 3. Can we add a simple test for it in one of the existing test > > files(say in 001_rep_changes.pl)? > > added a simple test. > This doesn't verify if the transaction is skipped. I think we should extend this test to check for a DEBUG message in the Logs (you need to probably set log_min_messages to DEBUG1 for this test). As an example, you can check the patch [1]. Also, it seems by mistake you have added wait_for_catchup() twice. Few other comments: = 1. Let's keep the parameter name as skipped_empty_xact in OutputPluginUpdateProgress so as to not confuse with the other patch's [2] keep_alive parameter. I think in this case we must send the keep_alive message so as to not make the syncrep wait whereas in the other patch we only need to send it periodically based on wal_sender_timeout parameter. 2. The new function SyncRepEnabled() seems confusing to me as the comments in SyncRepWaitForLSN() clearly state why we need to first read the parameter 'sync_standbys_defined' without any lock then read it again with a lock if the parameter is true. So, I just put that check back and also added a similar check in WalSndUpdateProgress. 3. @@ -1392,11 +1481,21 @@ pgoutput_truncate(LogicalDecodingContext *ctx, ReorderBufferTXN *txn, continue; relids[nrelids++] = relid; + + /* Send BEGIN if we haven't yet */ + if (txndata && !txndata->sent_begin_txn) + pgoutput_send_begin(ctx, txn); maybe_send_schema(ctx, change, relation, relentry); } if (nrelids > 0) { + txndata = (PGOutputTxnData *) txn->output_plugin_private; + + /* Send BEGIN if we haven't yet */ + if (txndata && !txndata->sent_begin_txn) + pgoutput_send_begin(ctx, txn); + Why do we need to try sending the begin in the second check? I think it should be sufficient to do it in the above loop. I have made these and a number of other changes in the attached patch. Do let me know what you think of the attached? [1] - https://www.postgresql.org/message-id/CAA4eK1JbLRj6pSUENfDFsqj0%2BadNob_%3DRPXpnUnWFBskVi5JhA%40mail.gmail.com [2] - https://www.postgresql.org/message-id/CAA4eK1LGnaPuWs2M4sDfpd6JQZjoh4DGAsgUvNW%3DOr8i9z6K8w%40mail.gmail.com -- With Regards, Amit Kapila. v27-0001-Skip-empty-transactions-for-logical-replication.patch Description: Binary data
Re: logical replication empty transactions
On Thu, Mar 17, 2022 at 10:43 PM Amit Kapila wrote: > > Review comments/suggestions: > = > 1. Isn't it sufficient to call pgoutput_send_begin from > maybe_send_schema as that is commonplace for all others and is always > the first message we send? If so, I think we can remove it from other > places? I've done the other way, I've removed it from maybe_send_schema as we always call this prior to calling maybe_send_schema. > 2. Can we write some comments to explain why we don't skip streaming > or prepared empty transactions and some possible solutions (the > protocol change and additional subscription parameter as discussed > [1]) as discussed in this thread pgoutput.c? I've added comment in the header of pgoutput_begin_prepare_txn() and pgoutput_stream_start() > 3. Can we add a simple test for it in one of the existing test > files(say in 001_rep_changes.pl)? added a simple test. > 4. I think we can drop the skip streaming patch as we can't do that for now. Dropped, In addition, I have also added a few more comments explaining why the begin send is delayed in pgoutput_change till row_filter is checked and also ran pgindent. regards, Ajin Cherian Fujitsu Australia v26-0001-Skip-empty-transactions-for-logical-replication.patch Description: Binary data
Re: logical replication empty transactions
On Wed, Mar 16, 2022 at 12:33 PM Ajin Cherian wrote: > > On Mon, Mar 7, 2022 at 11:44 PM Ajin Cherian wrote: > > > > Fixed. > > Review comments/suggestions: = 1. Isn't it sufficient to call pgoutput_send_begin from maybe_send_schema as that is commonplace for all others and is always the first message we send? If so, I think we can remove it from other places? 2. Can we write some comments to explain why we don't skip streaming or prepared empty transactions and some possible solutions (the protocol change and additional subscription parameter as discussed [1]) as discussed in this thread pgoutput.c? 3. Can we add a simple test for it in one of the existing test files(say in 001_rep_changes.pl)? 4. I think we can drop the skip streaming patch as we can't do that for now. -- With Regards, Amit Kapila.
Re: logical replication empty transactions
On Mon, Mar 7, 2022 at 11:44 PM Ajin Cherian wrote: > > Fixed. > > regards, > Ajin Cherian > Fujitsu Australia Rebased the patch and fixed some whitespace errors. regards, Ajin Cherian Fujitsu Australia v25-0001-Skip-empty-transactions-for-logical-replication.patch Description: Binary data v25-0002-Skip-empty-streamed-transactions-for-logical-rep.patch Description: Binary data
Re: logical replication empty transactions
On Mon, Mar 7, 2022 at 7:50 PM shiy.f...@fujitsu.com wrote: > > On Fri, Mar 4, 2022 9:41 AM Ajin Cherian wrote: > > > > I have split the patch into two. I have kept the logic of skipping > > streaming changes in the second patch. > > I will work on the second patch once we can figure out a solution for > > the COMMIT PREPARED after restart problem. > > > > Thanks for updating the patch. > > A comment on v23-0001 patch. > > @@ -1429,6 +1520,19 @@ pgoutput_message(LogicalDecodingContext *ctx, > ReorderBufferTXN *txn, > if (in_streaming) > xid = txn->xid; > > + /* > +* Output BEGIN if we haven't yet. > +* Avoid for non-transactional messages. > +*/ > + if (in_streaming || transactional) > + { > + PGOutputTxnData *txndata = (PGOutputTxnData *) > txn->output_plugin_private; > + > + /* Send BEGIN if we haven't yet */ > + if (txndata && !txndata->sent_begin_txn) > + pgoutput_send_begin(ctx, txn); > + } > + > OutputPluginPrepareWrite(ctx, true); > logicalrep_write_message(ctx->out, > xid, > > I think we don't need to send BEGIN if in_streaming is true, right? The first > patch doesn't skip streamed transaction, so should we modify > + if (in_streaming || transactional) > to > + if (!in_streaming && transactional) > ? > Fixed. regards, Ajin Cherian Fujitsu Australia v24-0002-Skip-empty-streamed-transactions-for-logical-rep.patch Description: Binary data v24-0001-Skip-empty-transactions-for-logical-replication.patch Description: Binary data
RE: logical replication empty transactions
On Fri, Mar 4, 2022 9:41 AM Ajin Cherian wrote: > > I have split the patch into two. I have kept the logic of skipping > streaming changes in the second patch. > I will work on the second patch once we can figure out a solution for > the COMMIT PREPARED after restart problem. > Thanks for updating the patch. A comment on v23-0001 patch. @@ -1429,6 +1520,19 @@ pgoutput_message(LogicalDecodingContext *ctx, ReorderBufferTXN *txn, if (in_streaming) xid = txn->xid; + /* +* Output BEGIN if we haven't yet. +* Avoid for non-transactional messages. +*/ + if (in_streaming || transactional) + { + PGOutputTxnData *txndata = (PGOutputTxnData *) txn->output_plugin_private; + + /* Send BEGIN if we haven't yet */ + if (txndata && !txndata->sent_begin_txn) + pgoutput_send_begin(ctx, txn); + } + OutputPluginPrepareWrite(ctx, true); logicalrep_write_message(ctx->out, xid, I think we don't need to send BEGIN if in_streaming is true, right? The first patch doesn't skip streamed transaction, so should we modify + if (in_streaming || transactional) to + if (!in_streaming && transactional) ? Regards, Shi yu
Re: logical replication empty transactions
On Fri, Mar 4, 2022 at 12:41 PM Ajin Cherian wrote: > > I have split the patch into two. I have kept the logic of skipping > streaming changes in the second patch. > I will work on the second patch once we can figure out a solution for > the COMMIT PREPARED after restart problem. > Please see below my review comments for the first patch only (v23-0001) == 1. Patch failed to apply cleanly - whitespace warnings. git apply ../patches_misc/v23-0001-Skip-empty-transactions-for-logical-replication.patch ../patches_misc/v23-0001-Skip-empty-transactions-for-logical-replication.patch:68: trailing whitespace. * change in a transaction is processed. This makes it possible warning: 1 line adds whitespace errors. ~~~ 2. src/backend/replication/pgoutput/pgoutput.c - typedef struct PGOutputTxnData +/* + * Maintain a per-transaction level variable to track whether the + * transaction has sent BEGIN. BEGIN is only sent when the first + * change in a transaction is processed. This makes it possible + * to skip transactions that are empty. + */ +typedef struct PGOutputTxnData I felt that this comment is describing details all about its bool member but I think maybe it should be describing something also about the structure itself (because this is the structure comment). E.g. it should say about it only being allocated by the pgoutput_begin_txn() and it is accessible via txn->output_plugin_private. Maybe also say this has subtle implications if this is NULL then it means the tx can't be 2PC etc... ~~~ 3. src/backend/replication/pgoutput/pgoutput.c - pgoutput_send_begin +/* + * Send BEGIN. + * + * This is where the BEGIN is actually sent. This is called while processing + * the first change of the transaction. + */ +static void +pgoutput_send_begin(LogicalDecodingContext *ctx, ReorderBufferTXN *txn) IMO there is no need to repeat "This is where the BEGIN is actually sent.", because "Send BEGIN." already said the same thing :-) ~~~ 4. src/backend/replication/pgoutput/pgoutput.c - pgoutput_commit_txn + /* + * If a BEGIN message was not yet sent, then it means there were no relevant + * changes encountered, so we can skip the COMMIT message too. + */ + sent_begin_txn = txndata->sent_begin_txn; + txn->output_plugin_private = NULL; + OutputPluginUpdateProgress(ctx, !sent_begin_txn); + + pfree(txndata); Not quite sure why this pfree is positioned where it is (after that function call). I felt this should be a couple of lines up so txndata is freed as soon as you had no more use for it (i.e. after you copied the bool from it) ~~~ 5. src/backend/replication/pgoutput/pgoutput.c - maybe_send_schema @@ -594,6 +658,13 @@ maybe_send_schema(LogicalDecodingContext *ctx, if (schema_sent) return; + /* set up txndata */ + txndata = toptxn->output_plugin_private; The comment does quite feel right. Nothing is "setting up" anything. Really, all this does is assign a reference to the tx private data. Probably better with no comment at all? ~~~ 6. src/backend/replication/pgoutput/pgoutput.c - maybe_send_schema I observed that every call to the maybe_send_schema function also has adjacent code that already/always is checking to call pgoutput_send_begin_tx function. So then I am wondering is the added logic to the maybe_send_schema even needed at all? It looks a bit redundant. Thoughts? ~~~ 7. src/backend/replication/pgoutput/pgoutput.c - pgoutput_change @@ -1141,6 +1212,7 @@ pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn, Relation relation, ReorderBufferChange *change) { PGOutputData *data = (PGOutputData *) ctx->output_plugin_private; + PGOutputTxnData *txndata = (PGOutputTxnData *) txn->output_plugin_private; MemoryContext old; Maybe if is worth deferring this assignment until after the row-filter check. Otherwise, you are maybe doing it for nothing and IIRC this is hot code so the less you do here the better. OTOH a single assignment probably amounts to almost nothing. ~~~ 8. src/backend/replication/pgoutput/pgoutput.c - pgoutput_change @@ -1354,6 +1438,7 @@ pgoutput_truncate(LogicalDecodingContext *ctx, ReorderBufferTXN *txn, int nrelations, Relation relations[], ReorderBufferChange *change) { PGOutputData *data = (PGOutputData *) ctx->output_plugin_private; + PGOutputTxnData *txndata; MemoryContext old; This variable declaration should be done later in the block where it is assigned. ~~~ 9. src/backend/replication/pgoutput/pgoutput.c - suggestion I notice there is quite a few places in the patch that look like: + txndata = (PGOutputTxnData *) txn->output_plugin_private; + + /* Send BEGIN if we haven't yet */ + if (txndata && !txndata->sent_begin_txn) + pgoutput_send_begin(ctx, txn); + It might be worth considering encapsulating all those in a helper function like: pgoutput_maybe_send_begin(ctx, txn) It would certainly be a lot tidier. ~~~ 10. src/backend/replication/syncrep.c - SyncRepEnabled @@ -539,6 +538,15 @@ SyncRepReleaseWaiters(void) } /* + *
Re: logical replication empty transactions
I have split the patch into two. I have kept the logic of skipping streaming changes in the second patch. I will work on the second patch once we can figure out a solution for the COMMIT PREPARED after restart problem. regards, Ajin Cherian v23-0001-Skip-empty-transactions-for-logical-replication.patch Description: Binary data v23-0002-Skip-empty-streamed-transactions-for-logical-rep.patch Description: Binary data
Re: logical replication empty transactions
On Wed, Mar 2, 2022 at 1:01 PM shiy.f...@fujitsu.com wrote: > > Hi, > > Here are some comments on the v21 patch. > > 1. > + WalSndKeepalive(false, 0); > > Maybe we can use InvalidXLogRecPtr here, instead of 0. > Fixed. > 2. > + pq_sendint64(_message, writePtr ? writePtr : sentPtr); > > Similarly, should we use XLogRecPtrIsInvalid()? Fixed > > 3. > @@ -1183,6 +1269,20 @@ pgoutput_change(LogicalDecodingContext *ctx, > ReorderBufferTXN *txn, > Assert(false); > } > > + if (in_streaming) > + { > + /* If streaming, send STREAM START if we haven't yet */ > + if (txndata && !txndata->sent_stream_start) > + pgoutput_send_stream_start(ctx, txn); > + } > + else > + { > + /* If not streaming, send BEGIN if we haven't yet */ > + if (txndata && !txndata->sent_begin_txn) > + pgoutput_send_begin(ctx, txn); > + } > + > + > /* Avoid leaking memory by using and resetting our own context */ > old = MemoryContextSwitchTo(data->context); > > > I am not sure if it is suitable to send begin or stream_start here, because > the > row filter is not checked yet. That means, empty transactions caused by row > filter are not skipped. > Moved the check down, so that row_filters are taken into account. regards, Ajin Cherian Fujitsu Australia v22-0001-Skip-empty-transactions-for-logical-replication.patch Description: Binary data
Re: logical replication empty transactions
On Wed, Mar 2, 2022 at 1:01 PM shiy.f...@fujitsu.com wrote: > > 4. > @@ -1617,9 +1829,21 @@ pgoutput_stream_prepare_txn(LogicalDecodingContext > *ctx, > ReorderBufferTXN *txn, > XLogRecPtr > prepare_lsn) > { > + PGOutputTxnData *txndata = txn->output_plugin_private; > + boolsent_begin_txn = txndata->sent_begin_txn; > + > Assert(rbtxn_is_streamed(txn)); > > - OutputPluginUpdateProgress(ctx); > + pfree(txndata); > + txn->output_plugin_private = NULL; > + > + if (!sent_begin_txn) > + { > + elog(DEBUG1, "Skipping replication of an empty transaction in > stream prepare"); > + return; > + } > + > + OutputPluginUpdateProgress(ctx, false); > OutputPluginPrepareWrite(ctx, true); > logicalrep_write_stream_prepare(ctx->out, txn, prepare_lsn); > OutputPluginWrite(ctx, true); > > I notice that the patch skips stream prepared transaction, this would cause an > error on subscriber side when committing this transaction on publisher side, > so > I think we'd better not do that. > > For example: > (set logical_decoding_work_mem = 64kB, max_prepared_transactions = 10 in > postgresql.conf) > > -- publisher > create table test (a int, b text, primary key(a)); > create table test2 (a int, b text, primary key(a)); > create publication pub for table test; > > -- subscriber > create table test (a int, b text, primary key(a)); > create table test2 (a int, b text, primary key(a)); > create subscription sub connection 'dbname=postgres port=5432' publication > pub with(two_phase=on, streaming=on); > > -- publisher > begin; > INSERT INTO test2 SELECT i, md5(i::text) FROM generate_series(1, 1000) s(i); > prepare transaction 't'; > commit prepared 't'; > > The error message in subscriber log: > ERROR: prepared transaction with identifier "pg_gid_16391_722" does not exist > Thanks for the test. I guess this mixed streaming+two-phase runs into the same problem that was there while skipping two-phased transactions. If the eventual commit prepared comes after a restart, then there is no way of knowing if the original transaction was skipped or not and we can't know if the commit prepared needs to be sent. I tried not skipping the "stream prepare", but that causes a crash in the apply worker as it tries to find the non-existent streamed file. We could add logic to silently ignore a spurious "stream prepare" but that might not be ideal. Any thoughts on how to address this? Or else, we will need to avoid skipping streamed transactions as well. regards, Ajin Cherian Fujitsu Australia
RE: logical replication empty transactions
Hi, Here are some comments on the v21 patch. 1. + WalSndKeepalive(false, 0); Maybe we can use InvalidXLogRecPtr here, instead of 0. 2. + pq_sendint64(_message, writePtr ? writePtr : sentPtr); Similarly, should we use XLogRecPtrIsInvalid()? 3. @@ -1183,6 +1269,20 @@ pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn, Assert(false); } + if (in_streaming) + { + /* If streaming, send STREAM START if we haven't yet */ + if (txndata && !txndata->sent_stream_start) + pgoutput_send_stream_start(ctx, txn); + } + else + { + /* If not streaming, send BEGIN if we haven't yet */ + if (txndata && !txndata->sent_begin_txn) + pgoutput_send_begin(ctx, txn); + } + + /* Avoid leaking memory by using and resetting our own context */ old = MemoryContextSwitchTo(data->context); I am not sure if it is suitable to send begin or stream_start here, because the row filter is not checked yet. That means, empty transactions caused by row filter are not skipped. 4. @@ -1617,9 +1829,21 @@ pgoutput_stream_prepare_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn, XLogRecPtr prepare_lsn) { + PGOutputTxnData *txndata = txn->output_plugin_private; + boolsent_begin_txn = txndata->sent_begin_txn; + Assert(rbtxn_is_streamed(txn)); - OutputPluginUpdateProgress(ctx); + pfree(txndata); + txn->output_plugin_private = NULL; + + if (!sent_begin_txn) + { + elog(DEBUG1, "Skipping replication of an empty transaction in stream prepare"); + return; + } + + OutputPluginUpdateProgress(ctx, false); OutputPluginPrepareWrite(ctx, true); logicalrep_write_stream_prepare(ctx->out, txn, prepare_lsn); OutputPluginWrite(ctx, true); I notice that the patch skips stream prepared transaction, this would cause an error on subscriber side when committing this transaction on publisher side, so I think we'd better not do that. For example: (set logical_decoding_work_mem = 64kB, max_prepared_transactions = 10 in postgresql.conf) -- publisher create table test (a int, b text, primary key(a)); create table test2 (a int, b text, primary key(a)); create publication pub for table test; -- subscriber create table test (a int, b text, primary key(a)); create table test2 (a int, b text, primary key(a)); create subscription sub connection 'dbname=postgres port=5432' publication pub with(two_phase=on, streaming=on); -- publisher begin; INSERT INTO test2 SELECT i, md5(i::text) FROM generate_series(1, 1000) s(i); prepare transaction 't'; commit prepared 't'; The error message in subscriber log: ERROR: prepared transaction with identifier "pg_gid_16391_722" does not exist Regards, Shi yu
Re: logical replication empty transactions
On Fri, Feb 25, 2022 at 9:17 PM Peter Smith wrote: > > Hi. Here are my review comments for the v19 patch. > > == > > 1. Commit message > > The current logical replication behavior is to send every transaction to > subscriber even though the transaction is empty (because it does not > contain changes from the selected publications). > > SUGGESTION > "to subscriber even though" --> "to the subscriber even if" Fixed. > > ~~~ > > 2. Commit message > > This patch addresses the above problem by postponing the BEGIN message > until the first change. While processing a COMMIT message, > if there is no other change for that transaction, > do not send COMMIT message. It means that pgoutput will > skip BEGIN/COMMIT messages for transactions that are empty. > > SUGGESTION > "if there is" --> "if there was" > "do not send COMMIT message" --> "do not send the COMMIT message" > "It means that pgoutput" --> "This means that pgoutput" > > ~~~ Fixed. > > 3. Commit message > > Shouldn't there be some similar description about using a lazy send > mechanism for STREAM START? > > ~~~ Added. > > 4. src/backend/replication/pgoutput/pgoutput.c - typedef struct > PGOutputTxnData > > +/* > + * Maintain a per-transaction level variable to track whether the > + * transaction has sent BEGIN. BEGIN is only sent when the first > + * change in a transaction is processed. This makes it possible > + * to skip transactions that are empty. > + */ > +typedef struct PGOutputTxnData > +{ > + bool sent_begin_txn;/* flag indicating whether BEGIN has been sent */ > + bool sent_stream_start; /* flag indicating if stream start has been sent > */ > + bool sent_any_stream; /* flag indicating if any stream has been sent */ > +} PGOutputTxnData; > + > > The struct comment looks stale because it doesn't mention anything > about the similar lazy send mechanism for STREAM_START. > > ~~~ Added. > > 5. src/backend/replication/pgoutput/pgoutput.c - pgoutput_begin_txn > > static void > pgoutput_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn) > { > + PGOutputTxnData*txndata = MemoryContextAllocZero(ctx->context, > + sizeof(PGOutputTxnData)); > + > + txndata->sent_begin_txn = false; > + txn->output_plugin_private = txndata; > +} > > You don’t need to assign the other members 'sent_stream_start', > 'sent_any_stream' because you are doing MemoryContextAllocZero anyway, > but for the same reason you did not really need to assign the > 'sent_begin_txn' flag either. > > I guess for consistency maybe it is better to (a) set all of them or > (b) set none of them. I prefer (b). > > ~~~ Did (b) > > 6. src/backend/replication/pgoutput/pgoutput.c - pgoutput_begin > > I feel the 'pgoutput_begin' function is not well named. It makes some > of the code where they are called look quite confusing. > > For streaming there is: > 1. pgoutput_stream_start (does not send) > 2. pgoutput_send_stream_start (does send) > so it is very clear. > > OTOH there are > 3. pgoutput_begin_txn (does not send) > 4. pgoutput_begin (does send) > > For consistency I think the 'pgoutput_begin' name should be changed to > include "send" verb > 1. pgoutput_begin_txn (does not send) > 2. pgoutput_send_begin_txn (does send) > > ~~~ Changed as mentioned. > > 7. src/backend/replication/pgoutput/pgoutput.c - maybe_send_schema > > @@ -594,6 +663,20 @@ maybe_send_schema(LogicalDecodingContext *ctx, > if (schema_sent) > return; > > + /* set up txndata */ > + txndata = toptxn->output_plugin_private; > + > + /* > +* Before we send schema, make sure that STREAM START/BEGIN/BEGIN PREPARE > +* is sent. If not, send now. > +*/ > + if (in_streaming && !txndata->sent_stream_start) > + pgoutput_send_stream_start(ctx, toptxn); > + else if (txndata && !txndata->sent_begin_txn) > + { > + pgoutput_begin(ctx, toptxn); > + } > + > > How come the in_streaming case is not checking for a NULL txndata > before referencing it? Even if it is OK to do that, some more comments > or assertions might help for this piece of code. > (Stop-Press: see later comments #9, #10) > > ~~~ Updated. > > 8. src/backend/replication/pgoutput/pgoutput.c - maybe_send_schema > > @@ -594,6 +663,20 @@ maybe_send_schema(LogicalDecodingContext *ctx, > if (schema_sent) > return; > > + /* set up txndata */ > + txndata = toptxn->output_plugin_private; > + > + /* > +* Before we send schema, make sure that STREAM START/BEGIN/BEGIN PREPARE > +* is sent. If not, send now. > +*/ > > What part of this code is doing anything about "BEGIN PREPARE" ? > > ~~~ Removed that reference. > > 9. src/backend/replication/pgoutput/pgoutput.c - pgoutput_change > > @@ -1183,6 +1267,15 @@ pgoutput_change(LogicalDecodingContext *ctx, > ReorderBufferTXN *txn, > Assert(false); > } > > + /* If streaming, send STREAM START if we haven't yet */ > + if (in_streaming && (txndata && !txndata->sent_stream_start)) > + pgoutput_send_stream_start(ctx, txn); > + /* > + *
Re: logical replication empty transactions
On Fri, Feb 18, 2022 at 9:27 PM Amit Kapila wrote: > > > Yeah, I think there could be multiple ways (a) We can send such a keep > alive in WalSndUpdateProgress() itself by using ctx->write_location. > For this, we need to modify WalSndKeepalive() to take sentPtr as > input. (b) set some flag in WalSndUpdateProgress() and then do it > somewhere in WalSndLoop probably in WalSndKeepaliveIfNecessary, or > maybe there is another better way. > Thanks for the suggestion Amit and Osumi-san, I experimented with both the suggestions but finally decided to use (a)Modifying WalSndKeepalive() to take an LSN optionally as input and passed in the ctx->write_location. I also verified that if I block the WalSndKeepalive() in WalSndWaitForWal, then my new code sends the keepalive when skipping transactions and the syncrep gets back feedback.. I will address comments from Peter and Wang in my next patch update. regards, Ajin Cherian Fujitsu Australia v20-0001-Skip-empty-transactions-for-logical-replication.patch Description: Binary data
Re: logical replication empty transactions
Hi. Here are my review comments for the v19 patch. == 1. Commit message The current logical replication behavior is to send every transaction to subscriber even though the transaction is empty (because it does not contain changes from the selected publications). SUGGESTION "to subscriber even though" --> "to the subscriber even if" ~~~ 2. Commit message This patch addresses the above problem by postponing the BEGIN message until the first change. While processing a COMMIT message, if there is no other change for that transaction, do not send COMMIT message. It means that pgoutput will skip BEGIN/COMMIT messages for transactions that are empty. SUGGESTION "if there is" --> "if there was" "do not send COMMIT message" --> "do not send the COMMIT message" "It means that pgoutput" --> "This means that pgoutput" ~~~ 3. Commit message Shouldn't there be some similar description about using a lazy send mechanism for STREAM START? ~~~ 4. src/backend/replication/pgoutput/pgoutput.c - typedef struct PGOutputTxnData +/* + * Maintain a per-transaction level variable to track whether the + * transaction has sent BEGIN. BEGIN is only sent when the first + * change in a transaction is processed. This makes it possible + * to skip transactions that are empty. + */ +typedef struct PGOutputTxnData +{ + bool sent_begin_txn;/* flag indicating whether BEGIN has been sent */ + bool sent_stream_start; /* flag indicating if stream start has been sent */ + bool sent_any_stream; /* flag indicating if any stream has been sent */ +} PGOutputTxnData; + The struct comment looks stale because it doesn't mention anything about the similar lazy send mechanism for STREAM_START. ~~~ 5. src/backend/replication/pgoutput/pgoutput.c - pgoutput_begin_txn static void pgoutput_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn) { + PGOutputTxnData*txndata = MemoryContextAllocZero(ctx->context, + sizeof(PGOutputTxnData)); + + txndata->sent_begin_txn = false; + txn->output_plugin_private = txndata; +} You don’t need to assign the other members 'sent_stream_start', 'sent_any_stream' because you are doing MemoryContextAllocZero anyway, but for the same reason you did not really need to assign the 'sent_begin_txn' flag either. I guess for consistency maybe it is better to (a) set all of them or (b) set none of them. I prefer (b). ~~~ 6. src/backend/replication/pgoutput/pgoutput.c - pgoutput_begin I feel the 'pgoutput_begin' function is not well named. It makes some of the code where they are called look quite confusing. For streaming there is: 1. pgoutput_stream_start (does not send) 2. pgoutput_send_stream_start (does send) so it is very clear. OTOH there are 3. pgoutput_begin_txn (does not send) 4. pgoutput_begin (does send) For consistency I think the 'pgoutput_begin' name should be changed to include "send" verb 1. pgoutput_begin_txn (does not send) 2. pgoutput_send_begin_txn (does send) ~~~ 7. src/backend/replication/pgoutput/pgoutput.c - maybe_send_schema @@ -594,6 +663,20 @@ maybe_send_schema(LogicalDecodingContext *ctx, if (schema_sent) return; + /* set up txndata */ + txndata = toptxn->output_plugin_private; + + /* +* Before we send schema, make sure that STREAM START/BEGIN/BEGIN PREPARE +* is sent. If not, send now. +*/ + if (in_streaming && !txndata->sent_stream_start) + pgoutput_send_stream_start(ctx, toptxn); + else if (txndata && !txndata->sent_begin_txn) + { + pgoutput_begin(ctx, toptxn); + } + How come the in_streaming case is not checking for a NULL txndata before referencing it? Even if it is OK to do that, some more comments or assertions might help for this piece of code. (Stop-Press: see later comments #9, #10) ~~~ 8. src/backend/replication/pgoutput/pgoutput.c - maybe_send_schema @@ -594,6 +663,20 @@ maybe_send_schema(LogicalDecodingContext *ctx, if (schema_sent) return; + /* set up txndata */ + txndata = toptxn->output_plugin_private; + + /* +* Before we send schema, make sure that STREAM START/BEGIN/BEGIN PREPARE +* is sent. If not, send now. +*/ What part of this code is doing anything about "BEGIN PREPARE" ? ~~~ 9. src/backend/replication/pgoutput/pgoutput.c - pgoutput_change @@ -1183,6 +1267,15 @@ pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn, Assert(false); } + /* If streaming, send STREAM START if we haven't yet */ + if (in_streaming && (txndata && !txndata->sent_stream_start)) + pgoutput_send_stream_start(ctx, txn); + /* + * Output BEGIN if we haven't yet, unless streaming. + */ + else if (!in_streaming && (txndata && !txndata->sent_begin_txn)) + pgoutput_begin(ctx, txn); + The above code fragment looks more like what IU was expecting should be in 'maybe_send_schema', If you expand it out (and tweak the comments) it can become much less complex looking IMO e.g. if (in_streaming) { /* If streaming, send STREAM START if we haven't yet */ if (txndata &&
RE: logical replication empty transactions
On Feb, Wed 23, 2022 at 10:58 PM Ajin Cherian wrote: > Few comments to V19-0001: 1. I think we should adjust the alignment format. git am ../v19-0001-Skip-empty-transactions-for-logical-replication.patch .git/rebase-apply/patch:197: indent with spaces. * Before we send schema, make sure that STREAM START/BEGIN/BEGIN PREPARE .git/rebase-apply/patch:198: indent with spaces. * is sent. If not, send now. .git/rebase-apply/patch:199: indent with spaces. */ .git/rebase-apply/patch:201: indent with spaces. pgoutput_send_stream_start(ctx, toptxn); .git/rebase-apply/patch:204: indent with spaces. pgoutput_begin(ctx, toptxn); warning: 5 lines add whitespace errors. 2. Structure member initialization. static void pgoutput_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn) { + PGOutputTxnData*txndata = MemoryContextAllocZero(ctx->context, + sizeof(PGOutputTxnData)); + + txndata->sent_begin_txn = false; + txn->output_plugin_private = txndata; +} Do we need to set sent_stream_start and sent_any_stream to false here? 3. Maybe we should add Assert(txndata) like function pgoutput_commit_txn in other functions. 4. In addition, I think we should keep a unified style. a). log style (maybe first one is better.) First style : "Skipping replication of an empty transaction in XXX" Second style : "skipping replication of an empty transaction" b) flag name (maybe second one is better.) First style : variable "sent_begin_txn" in function pgoutput_stream_*. Second style : variable "skip" in function pgoutput_commit_txn. Regards, Wang wei
Re: logical replication empty transactions
On Thu, Feb 17, 2022 at 9:42 PM Amit Kapila wrote: > > On Mon, Jan 31, 2022 at 6:18 PM Ajin Cherian wrote: > > > > Few comments: > = > 1. Is there any particular why the patch is not skipping empty xacts > for streaming (in-progress) transactions as noted in the commit > message as well? > I have added support for skipping streaming transaction. > 2. > +static void > +pgoutput_begin(LogicalDecodingContext *ctx, ReorderBufferTXN *txn) > +{ > bool send_replication_origin = txn->origin_id != InvalidRepOriginId; > + PGOutputTxnData *txndata = (PGOutputTxnData *) txn->output_plugin_private; > + > + Assert(txndata); > > I think here you can add an assert for sent_begin_txn to be always false? > Added. > 3. > +/* > + * Send BEGIN. > + * This is where the BEGIN is actually sent. This is called > + * while processing the first change of the transaction. > + */ > > Have an empty line between the first two lines to ensure consistency > with nearby comments. Also, the formatting of these lines appears > awkward, either run pgindent or make sure lines are not too short. > Changed. > 4. Do we really need to make any changes in PREPARE > transaction-related functions if can't skip in that case? I think you > can have a check if the output plugin private variable is not set then > ignore special optimization for sending begin. > I have modified this as well. I have also rebased the patch after it did not apply due to a new commit. I will next work on testing and improving the keepalive logic while skipping transactions. regards, Ajin Cherian Fujitsu Australia v19-0001-Skip-empty-transactions-for-logical-replication.patch Description: Binary data
Re: logical replication empty transactions
FYI - the latest v18 patch no longer applies due to a recent push [1]. -- [1] https://github.com/postgres/postgres/commit/52e4f0cd472d39d07732b99559989ea3b615be78 Kind Regards, Peter Smith. Fujitsu Australia
Re: logical replication empty transactions
On Fri, Feb 18, 2022 at 3:06 PM osumi.takami...@fujitsu.com wrote: > > On Friday, February 18, 2022 6:18 PM Amit Kapila > wrote: > > On Tue, Feb 8, 2022 at 5:27 AM osumi.takami...@fujitsu.com > > wrote: > > > > > > On Friday, August 13, 2021 8:01 PM Ajin Cherian wrote: > > > > On Mon, Aug 2, 2021 at 7:20 PM Amit Kapila > > > > wrote: > > > Changing the timing to send the keepalive to the decoding commit > > > timing didn't look impossible to me, although my suggestion can be > > > ad-hoc. > > > > > > After the initialization of sentPtr(by confirmed_flush lsn), sentPtr > > > is updated from logical_decoding_ctx->reader->EndRecPtr in > > XLogSendLogical. > > > In the XLogSendLogical, we update it after we execute > > LogicalDecodingProcessRecord. > > > This order leads to the current implementation to wait the next > > > iteration to send a keepalive in WalSndWaitForWal. > > > > > > But, I felt we can utilize end_lsn passed to ReorderBufferCommit for > > > updating sentPtr. The end_lsn is the lsn same as the > > > ctx->reader->EndRecPtr, which means advancing the timing to update the > > sentPtr for the commit case. > > > Then if the transaction is empty in synchronous mode, send the > > > keepalive in WalSndUpdateProgress directly, instead of having the > > > force_keepalive_syncrep flag and having it true. > > > > > > > You have a point in that we don't need to delay sending this message till > > next > > WalSndWaitForWal() but I don't see why we need to change anything about > > update of sentPtr. > Yeah, you're right. > Now I think we don't need the update of sentPtr to send a keepalive. > > I thought we can send a keepalive message > after its update in XLogSendLogical or any appropriate place for it after the > existing update. > Yeah, I think there could be multiple ways (a) We can send such a keep alive in WalSndUpdateProgress() itself by using ctx->write_location. For this, we need to modify WalSndKeepalive() to take sentPtr as input. (b) set some flag in WalSndUpdateProgress() and then do it somewhere in WalSndLoop probably in WalSndKeepaliveIfNecessary, or maybe there is another better way. -- With Regards, Amit Kapila.
RE: logical replication empty transactions
On Friday, February 18, 2022 6:18 PM Amit Kapila wrote: > On Tue, Feb 8, 2022 at 5:27 AM osumi.takami...@fujitsu.com > wrote: > > > > On Friday, August 13, 2021 8:01 PM Ajin Cherian wrote: > > > On Mon, Aug 2, 2021 at 7:20 PM Amit Kapila > > > wrote: > > Changing the timing to send the keepalive to the decoding commit > > timing didn't look impossible to me, although my suggestion can be > > ad-hoc. > > > > After the initialization of sentPtr(by confirmed_flush lsn), sentPtr > > is updated from logical_decoding_ctx->reader->EndRecPtr in > XLogSendLogical. > > In the XLogSendLogical, we update it after we execute > LogicalDecodingProcessRecord. > > This order leads to the current implementation to wait the next > > iteration to send a keepalive in WalSndWaitForWal. > > > > But, I felt we can utilize end_lsn passed to ReorderBufferCommit for > > updating sentPtr. The end_lsn is the lsn same as the > > ctx->reader->EndRecPtr, which means advancing the timing to update the > sentPtr for the commit case. > > Then if the transaction is empty in synchronous mode, send the > > keepalive in WalSndUpdateProgress directly, instead of having the > > force_keepalive_syncrep flag and having it true. > > > > You have a point in that we don't need to delay sending this message till next > WalSndWaitForWal() but I don't see why we need to change anything about > update of sentPtr. Yeah, you're right. Now I think we don't need the update of sentPtr to send a keepalive. I thought we can send a keepalive message after its update in XLogSendLogical or any appropriate place for it after the existing update. Best Regards, Takamichi Osumi
Re: logical replication empty transactions
On Tue, Feb 8, 2022 at 5:27 AM osumi.takami...@fujitsu.com wrote: > > On Friday, August 13, 2021 8:01 PM Ajin Cherian wrote: > > On Mon, Aug 2, 2021 at 7:20 PM Amit Kapila > > wrote: > Changing the timing to send the keepalive to the decoding commit > timing didn't look impossible to me, although my suggestion > can be ad-hoc. > > After the initialization of sentPtr(by confirmed_flush lsn), > sentPtr is updated from logical_decoding_ctx->reader->EndRecPtr in > XLogSendLogical. > In the XLogSendLogical, we update it after we execute > LogicalDecodingProcessRecord. > This order leads to the current implementation to wait the next iteration > to send a keepalive in WalSndWaitForWal. > > But, I felt we can utilize end_lsn passed to ReorderBufferCommit for updating > sentPtr. The end_lsn is the lsn same as the ctx->reader->EndRecPtr, > which means advancing the timing to update the sentPtr for the commit case. > Then if the transaction is empty in synchronous mode, > send the keepalive in WalSndUpdateProgress directly, > instead of having the force_keepalive_syncrep flag and having it true. > You have a point in that we don't need to delay sending this message till next WalSndWaitForWal() but I don't see why we need to change anything about update of sentPtr. -- With Regards, Amit Kapila.
Re: logical replication empty transactions
On Thu, Feb 17, 2022 at 4:12 PM Amit Kapila wrote: > > On Mon, Jan 31, 2022 at 6:18 PM Ajin Cherian wrote: > > > > Few comments: > = > One more comment: @@ -1546,10 +1557,11 @@ WalSndWaitForWal(XLogRecPtr loc) * otherwise idle, this keepalive will trigger a reply. Processing the * reply will update these MyWalSnd locations. */ - if (MyWalSnd->flush < sentPtr && + if (force_keepalive_syncrep || + (MyWalSnd->flush < sentPtr && MyWalSnd->write < sentPtr && - !waiting_for_ping_response) - WalSndKeepalive(false); + !waiting_for_ping_response)) + WalSndKeepalive(false); Will this allow syncrep to proceed in case we are skipping the transaction? Won't we need to send a feedback message with 'requestReply' true in this case as we release syncrep waiters while processing standby message, see ProcessStandbyReplyMessage->SyncRepReleaseWaiters. Without 'requestReply', the subscriber might not send any message and the syncrep won't proceed. Why do you decide to delay sending this message till WalSndWaitForWal()? It may not be called for each transaction. I feel we should try to device a test case to test this sync replication mechanism such that without this particular change the sync rep transaction waits momentarily but with this change it doesn't wait. I am not entirely sure whether we can devise an automated test as this is timing related issue but I guess we can at least manually try to produce a case. -- With Regards, Amit Kapila.
Re: logical replication empty transactions
On Wed, Feb 16, 2022 at 2:15 PM osumi.takami...@fujitsu.com wrote: > Another idea would be, to create an empty file under the the > pg_replslot/slotname > with a prefix different from "xid" in the DecodePrepare before the shutdown > if the prepare was empty, and bypass the cleanup of the serialized txns > and check the existence after the restart. But, this is pretty ad-hoc and I > wasn't sure > if to address the corner case of the restart has the strong enough > justification > to create this new file format. > Yes, this doesn't look very efficient. > Therefore, in my humble opinion, the idea of protocol change slightly wins, > since the impact of the protocol change would not be big. We introduced > the protocol version 3 in the devel version and the number of users should be > little. Yes, but we don't want to break backward compatibility for this small added optimization. Amit, I will work on your comments. regards, Ajin Cherian Fujitsu Australia
Re: logical replication empty transactions
On Wed, Feb 16, 2022 at 8:45 AM osumi.takami...@fujitsu.com wrote: [ideas to skip empty prepare/commit_prepare ] > > I feel if we don't want to change the protocol of commit_prepared, > we need to make the publisher solely judge whether the prepare was empty or > not, > after the restart. > > One idea I thought at the beginning was to utilize and apply > the existing mechanism to spill ReorderBufferSerializeTXN object to local > disk, > by postponing the prepare txn object cleanup and when the walsender exits > and commit prepared didn't come, spilling the transaction's data, > then restoring it after the restart in the DecodePrepare. > However, this idea wasn't crash-safe fundamentally. It means, > if the publisher crashes before spilling the empty prepare transaction, > we fail to detect the prepare was empty and come down to send the > commit_prepared > in the situation where the subscriber didn't get the prepare data again. > So, I thought to utilize the spill mechanism didn't work for this purpose. > > Another idea would be, to create an empty file under the the > pg_replslot/slotname > with a prefix different from "xid" in the DecodePrepare before the shutdown > if the prepare was empty, and bypass the cleanup of the serialized txns > and check the existence after the restart. But, this is pretty ad-hoc and I > wasn't sure > if to address the corner case of the restart has the strong enough > justification > to create this new file format. > I think for this idea to work you need to create such an empty file each time we skip empty prepare as the system might crash after prepare and we won't get time to create such a file. I don't think it is advisable to do I/O to save the network message. > Therefore, in my humble opinion, the idea of protocol change slightly wins, > since the impact of the protocol change would not be big. We introduced > the protocol version 3 in the devel version and the number of users should be > little. > There is also the cost of the additional check (whether prepared xact exists) at the time of processing each commit prepared message. I think if we want to go in this direction then it is better to do it via a subscription parameter (say skip_empty_prepare_xact or something like that) so that we can pay the additional cost of such a check conditionally when such a parameter is set by the user. I feel for now we can document in comments why we can't skip empty prepared transactions and maybe as an idea(s) worth exploring to implement the same. OTOH, if multiple agree on such a solution we can even try to implement it and see if that works. -- With Regards, Amit Kapila.
Re: logical replication empty transactions
On Mon, Jan 31, 2022 at 6:18 PM Ajin Cherian wrote: > Few comments: = 1. Is there any particular why the patch is not skipping empty xacts for streaming (in-progress) transactions as noted in the commit message as well? 2. +static void +pgoutput_begin(LogicalDecodingContext *ctx, ReorderBufferTXN *txn) +{ bool send_replication_origin = txn->origin_id != InvalidRepOriginId; + PGOutputTxnData *txndata = (PGOutputTxnData *) txn->output_plugin_private; + + Assert(txndata); I think here you can add an assert for sent_begin_txn to be always false? 3. +/* + * Send BEGIN. + * This is where the BEGIN is actually sent. This is called + * while processing the first change of the transaction. + */ Have an empty line between the first two lines to ensure consistency with nearby comments. Also, the formatting of these lines appears awkward, either run pgindent or make sure lines are not too short. 4. Do we really need to make any changes in PREPARE transaction-related functions if can't skip in that case? I think you can have a check if the output plugin private variable is not set then ignore special optimization for sending begin. -- With Regards, Amit Kapila.
RE: logical replication empty transactions
Hi I'll quote one other remaining discussion of this thread again to invoke more attentions from the community. On Friday, August 13, 2021 8:01 PM Ajin Cherian wrote: > On Mon, Aug 2, 2021 at 7:20 PM Amit Kapila > wrote: > > Few other miscellaneous comments: > > 1. > > static void > > pgoutput_commit_prepared_txn(LogicalDecodingContext *ctx, > > ReorderBufferTXN *txn, > > - XLogRecPtr commit_lsn) > > + XLogRecPtr commit_lsn, XLogRecPtr prepare_end_lsn, TimestampTz > > + prepare_time) > > { > > + PGOutputTxnData*txndata = (PGOutputTxnData *) > txn->output_plugin_private; > > + > > OutputPluginUpdateProgress(ctx); > > > > + /* > > + * If the BEGIN PREPARE was not yet sent, then it means there were no > > + * relevant changes encountered, so we can skip the COMMIT PREPARED > > + * message too. > > + */ > > + if (txndata) > > + { > > + bool skip = !txndata->sent_begin_txn; pfree(txndata); > > + txn->output_plugin_private = NULL; > > > > How is this supposed to work after the restart when prepared is sent > > before the restart and we are just sending commit_prepared after > > restart? Won't this lead to sending commit_prepared even when the > > corresponding prepare is not sent? Can we think of a better way to > > deal with this? > > > > I have tried to resolve this by adding logic in worker,c to silently ignore > spurious > commit_prepareds. But this change required checking if the prepare exists on > the subscriber before attempting the commit_prepared but the current API that > checks this requires prepare time and transaction end_lsn. But for this I had > to > change the protocol of commit_prepared, and I understand that this would > break backward compatibility between subscriber and publisher (you have > raised this issue as well). > I am not sure how else to handle this, let me know if you have any other > ideas. I feel if we don't want to change the protocol of commit_prepared, we need to make the publisher solely judge whether the prepare was empty or not, after the restart. One idea I thought at the beginning was to utilize and apply the existing mechanism to spill ReorderBufferSerializeTXN object to local disk, by postponing the prepare txn object cleanup and when the walsender exits and commit prepared didn't come, spilling the transaction's data, then restoring it after the restart in the DecodePrepare. However, this idea wasn't crash-safe fundamentally. It means, if the publisher crashes before spilling the empty prepare transaction, we fail to detect the prepare was empty and come down to send the commit_prepared in the situation where the subscriber didn't get the prepare data again. So, I thought to utilize the spill mechanism didn't work for this purpose. Another idea would be, to create an empty file under the the pg_replslot/slotname with a prefix different from "xid" in the DecodePrepare before the shutdown if the prepare was empty, and bypass the cleanup of the serialized txns and check the existence after the restart. But, this is pretty ad-hoc and I wasn't sure if to address the corner case of the restart has the strong enough justification to create this new file format. Therefore, in my humble opinion, the idea of protocol change slightly wins, since the impact of the protocol change would not be big. We introduced the protocol version 3 in the devel version and the number of users should be little. Best Regards, Takamichi Osumi
RE: logical replication empty transactions
Hi, Thank you for your updating the patch. I'll quote one of the past discussions in order to make this thread go forward or more active. On Friday, August 13, 2021 8:01 PM Ajin Cherian wrote: > On Mon, Aug 2, 2021 at 7:20 PM Amit Kapila > wrote: > > > > On Fri, Jul 23, 2021 at 3:39 PM Ajin Cherian wrote: > > > > > > > Let's first split the patch for prepared and non-prepared cases as > > that will help to focus on each of them separately. BTW, why haven't > > you considered implementing point 1b as explained by Andres in his > > email [1]? I think we can send a keepalive message in case of > > synchronous replication when we skip an empty transaction, otherwise, > > it might delay in responding to transactions synchronous_commit mode. > > I think in the tests done in the thread, it might not have been shown > > because we are already sending keepalives too frequently. But what if > > someone disables wal_sender_timeout or kept it to a very large value? > > See WalSndKeepaliveIfNecessary. The other thing you might want to look > > at is if the reason for frequent keepalives is the same as described > > in the email [2]. > > > > I have tried to address the comment here by modifying the > ctx->update_progress callback function (WalSndUpdateProgress) provided > for plugins. I have added an option > by which the callback can specify if it wants to send keep_alives. And when > the callback is called with that option set, walsender updates a flag > force_keep_alive_syncrep. > The Walsender in the WalSndWaitForWal for loop, checks this flag and if > synchronous replication is enabled, then sends a keep alive. > Currently this logic > is added as an else to the current logic that is already there in > WalSndWaitForWal, which is probably considered unnecessary and a source of > the keep alive flood that you talked about. So, I can change that according to > how that fix shapes up there. I have also added an extern function in > syncrep.c > that makes it possible for walsender to query if synchronous replication is > turned on. Changing the timing to send the keepalive to the decoding commit timing didn't look impossible to me, although my suggestion can be ad-hoc. After the initialization of sentPtr(by confirmed_flush lsn), sentPtr is updated from logical_decoding_ctx->reader->EndRecPtr in XLogSendLogical. In the XLogSendLogical, we update it after we execute LogicalDecodingProcessRecord. This order leads to the current implementation to wait the next iteration to send a keepalive in WalSndWaitForWal. But, I felt we can utilize end_lsn passed to ReorderBufferCommit for updating sentPtr. The end_lsn is the lsn same as the ctx->reader->EndRecPtr, which means advancing the timing to update the sentPtr for the commit case. Then if the transaction is empty in synchronous mode, send the keepalive in WalSndUpdateProgress directly, instead of having the force_keepalive_syncrep flag and having it true. Best Regards, Takamichi Osumi
Re: logical replication empty transactions
On Sun, Jan 30, 2022 at 7:04 PM osumi.takami...@fujitsu.com wrote: > > On Thursday, January 27, 2022 9:57 PM Ajin Cherian wrote: > Hi, thanks for your patch update. > > > > On Wed, Jan 26, 2022 at 8:33 PM osumi.takami...@fujitsu.com > > wrote: > > > > > > On Tuesday, January 11, 2022 6:43 PM Ajin Cherian > > wrote: > > > (3) Is this patch's reponsibility to intialize the data in > > pgoutput_begin_prepare_txn ? > > > > > > @@ -433,6 +487,8 @@ static void > > > pgoutput_begin_prepare_txn(LogicalDecodingContext *ctx, > > > ReorderBufferTXN *txn) { > > > boolsend_replication_origin = txn->origin_id != > > InvalidRepOriginId; > > > + PGOutputTxnData*txndata = > > MemoryContextAllocZero(ctx->context, > > > + > > > + sizeof(PGOutputTxnData)); > > > > > > OutputPluginPrepareWrite(ctx, !send_replication_origin); > > > logicalrep_write_begin_prepare(ctx->out, txn); > > > > > > > > > Even if we need this initialization for either non streaming case or > > > non two_phase case, there can be another issue. > > > We don't free the allocated memory for this data, right ? > > > There's only one place to use free in the entire patch, which is in > > > the pgoutput_commit_txn(). So, corresponding free of memory looked > > > necessary in the two phase commit functions. > > > > > > > Actually it is required for begin_prepare to set the data type, so that the > > checks > > in the pgoutput_change can make sure that the begin prepare is sent. I've > > also > > added a free in commit_prepared code. > Okay, but if we choose the design that this patch takes > care of the initialization in pgoutput_begin_prepare_txn(), > we need another free in pgoutput_rollback_prepared_txn(). > Could you please add some codes similar to pgoutput_commit_prepared_txn() to > the same ? > If we simply execute rollback prepared for non streaming transaction, > we don't free it. > Fixed. > > Some other new minor comments. > > (a) can be "synchronous replication", instead of "Synchronous Replication" > > When we have a look at the syncrep.c, we use the former usually in > a normal comment. > > /* > + * Check if Synchronous Replication is enabled > + */ Fixed. > > (b) move below pgoutput_truncate two codes to the case where if nrelids > 0. > > @@ -770,6 +850,7 @@ pgoutput_truncate(LogicalDecodingContext *ctx, > ReorderBufferTXN *txn, > int nrelations, Relation relations[], > ReorderBufferChange *change) > { > PGOutputData *data = (PGOutputData *) ctx->output_plugin_private; > + PGOutputTxnData *txndata = (PGOutputTxnData *) > txn->output_plugin_private; > MemoryContext old; > RelationSyncEntry *relentry; > int i; > @@ -777,6 +858,9 @@ pgoutput_truncate(LogicalDecodingContext *ctx, > ReorderBufferTXN *txn, > Oid*relids; > TransactionId xid = InvalidTransactionId; > > + /* If not streaming, should have setup txndata as part of BEGIN/BEGIN > PREPARE */ > + Assert(in_streaming || txndata); > + > Fixed. > (c) fix indent with spaces (for the one sentence of SyncRepEnabled) > > @@ -539,6 +538,15 @@ SyncRepReleaseWaiters(void) > } > > /* > + * Check if Synchronous Replication is enabled > + */ > +bool > +SyncRepEnabled(void) > +{ > +return SyncRepRequested() && ((volatile WalSndCtlData *) > WalSndCtl)->sync_standbys_defined; > +} > + > +/* > > This can be detected by git am. > Fixed. regards, Ajin Cherian Fujitsu Australia v18-0001-Skip-empty-transactions-for-logical-replication.patch Description: Binary data
RE: logical replication empty transactions
On Thursday, January 27, 2022 9:57 PM Ajin Cherian wrote: Hi, thanks for your patch update. > On Wed, Jan 26, 2022 at 8:33 PM osumi.takami...@fujitsu.com > wrote: > > > > On Tuesday, January 11, 2022 6:43 PM Ajin Cherian > wrote: > > (3) Is this patch's reponsibility to intialize the data in > pgoutput_begin_prepare_txn ? > > > > @@ -433,6 +487,8 @@ static void > > pgoutput_begin_prepare_txn(LogicalDecodingContext *ctx, > > ReorderBufferTXN *txn) { > > boolsend_replication_origin = txn->origin_id != > InvalidRepOriginId; > > + PGOutputTxnData*txndata = > MemoryContextAllocZero(ctx->context, > > + > > + sizeof(PGOutputTxnData)); > > > > OutputPluginPrepareWrite(ctx, !send_replication_origin); > > logicalrep_write_begin_prepare(ctx->out, txn); > > > > > > Even if we need this initialization for either non streaming case or > > non two_phase case, there can be another issue. > > We don't free the allocated memory for this data, right ? > > There's only one place to use free in the entire patch, which is in > > the pgoutput_commit_txn(). So, corresponding free of memory looked > > necessary in the two phase commit functions. > > > > Actually it is required for begin_prepare to set the data type, so that the > checks > in the pgoutput_change can make sure that the begin prepare is sent. I've also > added a free in commit_prepared code. Okay, but if we choose the design that this patch takes care of the initialization in pgoutput_begin_prepare_txn(), we need another free in pgoutput_rollback_prepared_txn(). Could you please add some codes similar to pgoutput_commit_prepared_txn() to the same ? If we simply execute rollback prepared for non streaming transaction, we don't free it. Some other new minor comments. (a) can be "synchronous replication", instead of "Synchronous Replication" When we have a look at the syncrep.c, we use the former usually in a normal comment. /* + * Check if Synchronous Replication is enabled + */ (b) move below pgoutput_truncate two codes to the case where if nrelids > 0. @@ -770,6 +850,7 @@ pgoutput_truncate(LogicalDecodingContext *ctx, ReorderBufferTXN *txn, int nrelations, Relation relations[], ReorderBufferChange *change) { PGOutputData *data = (PGOutputData *) ctx->output_plugin_private; + PGOutputTxnData *txndata = (PGOutputTxnData *) txn->output_plugin_private; MemoryContext old; RelationSyncEntry *relentry; int i; @@ -777,6 +858,9 @@ pgoutput_truncate(LogicalDecodingContext *ctx, ReorderBufferTXN *txn, Oid*relids; TransactionId xid = InvalidTransactionId; + /* If not streaming, should have setup txndata as part of BEGIN/BEGIN PREPARE */ + Assert(in_streaming || txndata); + (c) fix indent with spaces (for the one sentence of SyncRepEnabled) @@ -539,6 +538,15 @@ SyncRepReleaseWaiters(void) } /* + * Check if Synchronous Replication is enabled + */ +bool +SyncRepEnabled(void) +{ +return SyncRepRequested() && ((volatile WalSndCtlData *) WalSndCtl)->sync_standbys_defined; +} + +/* This can be detected by git am. Best Regards, Takamichi Osumi
Re: logical replication empty transactions
On Thu, Jan 27, 2022 at 12:16 AM osumi.takami...@fujitsu.com wrote: > > On Tuesday, January 11, 2022 6:43 PM From: Ajin Cherian > wrote: > > Minor update to rebase the patch so that it applies clean on HEAD. > Hi, let me share some additional comments on v16. > > > (1) comment of pgoutput_change > > @@ -630,11 +688,15 @@ pgoutput_change(LogicalDecodingContext *ctx, > ReorderBufferTXN *txn, > Relation relation, ReorderBufferChange > *change) > { > PGOutputData *data = (PGOutputData *) ctx->output_plugin_private; > + PGOutputTxnData *txndata = (PGOutputTxnData *) > txn->output_plugin_private; > MemoryContext old; > RelationSyncEntry *relentry; > TransactionId xid = InvalidTransactionId; > Relationancestor = NULL; > > + /* If not streaming, should have setup txndata as part of BEGIN/BEGIN > PREPARE */ > + Assert(in_streaming || txndata); > + > > In my humble opinion, the comment should not touch BEGIN PREPARE, > because this patch's scope doesn't include two phase commit. > (We could add this in another patch to extend the scope after the commit ?) > We have to include BEGIN PREPARE as well, as the txndata has to be setup. Only difference is that we will not skip empty transaction in BEGIN PREPARE > This applies to pgoutput_truncate's comment. > > (2) "keep alive" should be "keepalive" in WalSndUpdateProgress > > /* > +* When skipping empty transactions in synchronous replication, we > need > +* to send a keep alive to keep the MyWalSnd locations updated. > +*/ > + force_keepalive_syncrep = send_keepalive && SyncRepEnabled(); > + > > Also, this applies to the comment for force_keepalive_syncrep. Fixed. > > (3) Should finish the second sentence with period in the comment of > pgoutput_message. > > @@ -845,6 +923,19 @@ pgoutput_message(LogicalDecodingContext *ctx, > ReorderBufferTXN *txn, > if (in_streaming) > xid = txn->xid; > > + /* > +* Output BEGIN if we haven't yet. > +* Avoid for streaming and non-transactional messages > Fixed. > (4) "begin" can be changed to "BEGIN" in the comment of PGOutputTxnData > definition. > > In the entire patch, when we express BEGIN message, > we use capital letters "BEGIN" except for one place. > We can apply the same to this place as well. > > +typedef struct PGOutputTxnData > +{ > + bool sent_begin_txn;/* flag indicating whether begin has been > sent */ > +} PGOutputTxnData; > + > Fixed. > (5) inconsistent way to write Assert statements with blank lines > > In the below case, it'd be better to insert one blank line > after the Assert(); > > +static void > +pgoutput_begin(LogicalDecodingContext *ctx, ReorderBufferTXN *txn) > +{ > boolsend_replication_origin = txn->origin_id != > InvalidRepOriginId; > + PGOutputTxnData *txndata = (PGOutputTxnData *) > txn->output_plugin_private; > > + Assert(txndata); > OutputPluginPrepareWrite(ctx, !send_replication_origin); > > Fixed. > (6) new codes in the pgoutput_commit_txn looks messy slightly > > @@ -419,7 +455,25 @@ static void > pgoutput_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn, > XLogRecPtr commit_lsn) > { > - OutputPluginUpdateProgress(ctx); > + PGOutputTxnData *txndata = (PGOutputTxnData *) > txn->output_plugin_private; > + boolskip; > + > + Assert(txndata); > + > + /* > +* If a BEGIN message was not yet sent, then it means there were no > relevant > +* changes encountered, so we can skip the COMMIT message too. > +*/ > + skip = !txndata->sent_begin_txn; > + pfree(txndata); > + txn->output_plugin_private = NULL; > + OutputPluginUpdateProgress(ctx, skip); > > Could we conduct a refactoring for this new part ? > IMO, writing codes to free the data structure at the top > of function seems weird. > > One idea is to export some part there > and write a new function, something like below. > > static bool > txn_sent_begin(ReorderBufferTXN *txn) > { > PGOutputTxnData *txndata = (PGOutputTxnData *) txn->output_plugin_private; > boolneeds_skip; > > Assert(txndata); > > needs_skip = !txndata->sent_begin_txn; > > pfree(txndata); > txn->output_plugin_private = NULL; > > return needs_skip; > } > > FYI, I had a look at the > v12-0002-Skip-empty-prepared-transactions-for-logical-rep.patch > for reference of pgoutput_rollback_prepared_txn and > pgoutput_commit_prepared_txn. > Looks this kind of function might work for future extensions as well. > What did you think ? I changed a bit, but I'd hold a comprehensive rewrite when a future patch supports skipping empty transactions in two-phase transactions and streaming transactions. regards, Ajin Cherian
Re: logical replication empty transactions
On Wed, Jan 26, 2022 at 8:33 PM osumi.takami...@fujitsu.com wrote: > > On Tuesday, January 11, 2022 6:43 PM Ajin Cherian wrote: > > Minor update to rebase the patch so that it applies clean on HEAD. > Hi, thanks for you rebase. > > Several comments. > > (1) the commit message > > " > transactions, keepalive messages are sent to keep the LSN locations updated > on the standby. > This patch does not skip empty transactions that are "streaming" or > "two-phase". > " > > I suggest that one blank line might be needed before the last paragraph. Changed. > > (2) Could you please remove one pair of curly brackets for one sentence below > ? > > @@ -1546,10 +1557,13 @@ WalSndWaitForWal(XLogRecPtr loc) > * otherwise idle, this keepalive will trigger a reply. > Processing the > * reply will update these MyWalSnd locations. > */ > - if (MyWalSnd->flush < sentPtr && > + if (force_keepalive_syncrep || > + (MyWalSnd->flush < sentPtr && > MyWalSnd->write < sentPtr && > - !waiting_for_ping_response) > + !waiting_for_ping_response)) > + { > WalSndKeepalive(false); > + } > > Changed. > (3) Is this patch's reponsibility to intialize the data in > pgoutput_begin_prepare_txn ? > > @@ -433,6 +487,8 @@ static void > pgoutput_begin_prepare_txn(LogicalDecodingContext *ctx, ReorderBufferTXN > *txn) > { > boolsend_replication_origin = txn->origin_id != > InvalidRepOriginId; > + PGOutputTxnData*txndata = MemoryContextAllocZero(ctx->context, > + >sizeof(PGOutputTxnData)); > > OutputPluginPrepareWrite(ctx, !send_replication_origin); > logicalrep_write_begin_prepare(ctx->out, txn); > > > Even if we need this initialization for either non streaming case > or non two_phase case, there can be another issue. > We don't free the allocated memory for this data, right ? > There's only one place to use free in the entire patch, > which is in the pgoutput_commit_txn(). So, > corresponding free of memory looked necessary > in the two phase commit functions. > Actually it is required for begin_prepare to set the data type, so that the checks in the pgoutput_change can make sure that the begin prepare is sent. I've also added a free in commit_prepared code. > (4) SyncRepEnabled's better alignment. > > IIUC, SyncRepEnabled is called not only by the walsender but also by other > backends > via CommitTransaction -> RecordTransactionCommit -> SyncRepWaitForLSN. > Then, the place to add the prototype function for SyncRepEnabled seems not > appropriate, > strictly speaking or requires a comment like /* called by wal sender or other > backends */. > > @@ -90,6 +90,7 @@ extern void SyncRepCleanupAtProcExit(void); > /* called by wal sender */ > extern void SyncRepInitConfig(void); > extern void SyncRepReleaseWaiters(void); > +extern bool SyncRepEnabled(void); > > Even if we intend it is only used by the walsender, the current code place > of SyncRepEnabled in the syncrep.c might not be perfect. > In this file, seemingly we have a section for functions for wal sender > processes > and the place where you wrote it is not here. > > at src/backend/replication/syncrep.c, find a comment below. > /* > * === > * Synchronous Replication functions for wal sender processes > * === > */ Changed. > > (5) minor alignment for expressing a couple of messages. > > @@ -777,6 +846,9 @@ pgoutput_truncate(LogicalDecodingContext *ctx, > ReorderBufferTXN *txn, > Oid*relids; > TransactionId xid = InvalidTransactionId; > > + /* If not streaming, should have setup txndata as part of BEGIN/BEGIN > PREPARE */ > + Assert(in_streaming || txndata); > > > In the commit message, the way you write is below. > ... > skip BEGIN / COMMIT messages for transactions that are empty. The patch > ... > > In this case, we have spaces back and forth for "BEGIN / COMMIT". > Then, I suggest to unify all of those to show better alignment. fixed. regards, Ajin Cherian v17-0001-Skip-empty-transactions-for-logical-replication.patch Description: Binary data
RE: logical replication empty transactions
On Tuesday, January 11, 2022 6:43 PM From: Ajin Cherian wrote: > Minor update to rebase the patch so that it applies clean on HEAD. Hi, let me share some additional comments on v16. (1) comment of pgoutput_change @@ -630,11 +688,15 @@ pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn, Relation relation, ReorderBufferChange *change) { PGOutputData *data = (PGOutputData *) ctx->output_plugin_private; + PGOutputTxnData *txndata = (PGOutputTxnData *) txn->output_plugin_private; MemoryContext old; RelationSyncEntry *relentry; TransactionId xid = InvalidTransactionId; Relationancestor = NULL; + /* If not streaming, should have setup txndata as part of BEGIN/BEGIN PREPARE */ + Assert(in_streaming || txndata); + In my humble opinion, the comment should not touch BEGIN PREPARE, because this patch's scope doesn't include two phase commit. (We could add this in another patch to extend the scope after the commit ?) This applies to pgoutput_truncate's comment. (2) "keep alive" should be "keepalive" in WalSndUpdateProgress /* +* When skipping empty transactions in synchronous replication, we need +* to send a keep alive to keep the MyWalSnd locations updated. +*/ + force_keepalive_syncrep = send_keepalive && SyncRepEnabled(); + Also, this applies to the comment for force_keepalive_syncrep. (3) Should finish the second sentence with period in the comment of pgoutput_message. @@ -845,6 +923,19 @@ pgoutput_message(LogicalDecodingContext *ctx, ReorderBufferTXN *txn, if (in_streaming) xid = txn->xid; + /* +* Output BEGIN if we haven't yet. +* Avoid for streaming and non-transactional messages (4) "begin" can be changed to "BEGIN" in the comment of PGOutputTxnData definition. In the entire patch, when we express BEGIN message, we use capital letters "BEGIN" except for one place. We can apply the same to this place as well. +typedef struct PGOutputTxnData +{ + bool sent_begin_txn;/* flag indicating whether begin has been sent */ +} PGOutputTxnData; + (5) inconsistent way to write Assert statements with blank lines In the below case, it'd be better to insert one blank line after the Assert(); +static void +pgoutput_begin(LogicalDecodingContext *ctx, ReorderBufferTXN *txn) +{ boolsend_replication_origin = txn->origin_id != InvalidRepOriginId; + PGOutputTxnData *txndata = (PGOutputTxnData *) txn->output_plugin_private; + Assert(txndata); OutputPluginPrepareWrite(ctx, !send_replication_origin); (6) new codes in the pgoutput_commit_txn looks messy slightly @@ -419,7 +455,25 @@ static void pgoutput_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn, XLogRecPtr commit_lsn) { - OutputPluginUpdateProgress(ctx); + PGOutputTxnData *txndata = (PGOutputTxnData *) txn->output_plugin_private; + boolskip; + + Assert(txndata); + + /* +* If a BEGIN message was not yet sent, then it means there were no relevant +* changes encountered, so we can skip the COMMIT message too. +*/ + skip = !txndata->sent_begin_txn; + pfree(txndata); + txn->output_plugin_private = NULL; + OutputPluginUpdateProgress(ctx, skip); Could we conduct a refactoring for this new part ? IMO, writing codes to free the data structure at the top of function seems weird. One idea is to export some part there and write a new function, something like below. static bool txn_sent_begin(ReorderBufferTXN *txn) { PGOutputTxnData *txndata = (PGOutputTxnData *) txn->output_plugin_private; boolneeds_skip; Assert(txndata); needs_skip = !txndata->sent_begin_txn; pfree(txndata); txn->output_plugin_private = NULL; return needs_skip; } FYI, I had a look at the v12-0002-Skip-empty-prepared-transactions-for-logical-rep.patch for reference of pgoutput_rollback_prepared_txn and pgoutput_commit_prepared_txn. Looks this kind of function might work for future extensions as well. What did you think ? Best Regards, Takamichi Osumi
RE: logical replication empty transactions
On Tuesday, January 11, 2022 6:43 PM Ajin Cherian wrote: > Minor update to rebase the patch so that it applies clean on HEAD. Hi, thanks for you rebase. Several comments. (1) the commit message " transactions, keepalive messages are sent to keep the LSN locations updated on the standby. This patch does not skip empty transactions that are "streaming" or "two-phase". " I suggest that one blank line might be needed before the last paragraph. (2) Could you please remove one pair of curly brackets for one sentence below ? @@ -1546,10 +1557,13 @@ WalSndWaitForWal(XLogRecPtr loc) * otherwise idle, this keepalive will trigger a reply. Processing the * reply will update these MyWalSnd locations. */ - if (MyWalSnd->flush < sentPtr && + if (force_keepalive_syncrep || + (MyWalSnd->flush < sentPtr && MyWalSnd->write < sentPtr && - !waiting_for_ping_response) + !waiting_for_ping_response)) + { WalSndKeepalive(false); + } (3) Is this patch's reponsibility to intialize the data in pgoutput_begin_prepare_txn ? @@ -433,6 +487,8 @@ static void pgoutput_begin_prepare_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn) { boolsend_replication_origin = txn->origin_id != InvalidRepOriginId; + PGOutputTxnData*txndata = MemoryContextAllocZero(ctx->context, + sizeof(PGOutputTxnData)); OutputPluginPrepareWrite(ctx, !send_replication_origin); logicalrep_write_begin_prepare(ctx->out, txn); Even if we need this initialization for either non streaming case or non two_phase case, there can be another issue. We don't free the allocated memory for this data, right ? There's only one place to use free in the entire patch, which is in the pgoutput_commit_txn(). So, corresponding free of memory looked necessary in the two phase commit functions. (4) SyncRepEnabled's better alignment. IIUC, SyncRepEnabled is called not only by the walsender but also by other backends via CommitTransaction -> RecordTransactionCommit -> SyncRepWaitForLSN. Then, the place to add the prototype function for SyncRepEnabled seems not appropriate, strictly speaking or requires a comment like /* called by wal sender or other backends */. @@ -90,6 +90,7 @@ extern void SyncRepCleanupAtProcExit(void); /* called by wal sender */ extern void SyncRepInitConfig(void); extern void SyncRepReleaseWaiters(void); +extern bool SyncRepEnabled(void); Even if we intend it is only used by the walsender, the current code place of SyncRepEnabled in the syncrep.c might not be perfect. In this file, seemingly we have a section for functions for wal sender processes and the place where you wrote it is not here. at src/backend/replication/syncrep.c, find a comment below. /* * === * Synchronous Replication functions for wal sender processes * === */ (5) minor alignment for expressing a couple of messages. @@ -777,6 +846,9 @@ pgoutput_truncate(LogicalDecodingContext *ctx, ReorderBufferTXN *txn, Oid*relids; TransactionId xid = InvalidTransactionId; + /* If not streaming, should have setup txndata as part of BEGIN/BEGIN PREPARE */ + Assert(in_streaming || txndata); In the commit message, the way you write is below. ... skip BEGIN / COMMIT messages for transactions that are empty. The patch ... In this case, we have spaces back and forth for "BEGIN / COMMIT". Then, I suggest to unify all of those to show better alignment. Best Regards, Takamichi Osumi
Re: logical replication empty transactions
On Wed, Sep 1, 2021 at 8:57 PM Ajin Cherian wrote: > > Thanks for the comments. Addressed them in the attached patch. > > regards, > Ajin Cherian > Fujitsu Australia Minor update to rebase the patch so that it applies clean on HEAD. regards, Ajin Cherian regards, Ajin Cherian v16-0001-Skip-empty-transactions-for-logical-replication.patch Description: Binary data
Re: logical replication empty transactions
On Wed, Aug 25, 2021 at 5:15 PM Peter Smith wrote: > > I reviewed the v14-0001 patch. > > All my previous comments have been addressed. > > Apply / build / test was all OK. > > -- > > More review comments: > > 1. Params names in the function declarations should match the rest of the > code. > > 1a. src/include/replication/logical.h > > @@ -26,7 +26,8 @@ typedef LogicalOutputPluginWriterWrite > LogicalOutputPluginWriterPrepareWrite; > > typedef void (*LogicalOutputPluginWriterUpdateProgress) (struct > LogicalDecodingContext *lr, > XLogRecPtr Ptr, > - TransactionId xid > + TransactionId xid, > + bool send_keep_alive > > => > Change "send_keep_alive" --> "send_keepalive" > > ~~ > > 1b. src/include/replication/output_plugin.h > > @@ -243,6 +243,6 @@ typedef struct OutputPluginCallbacks > /* Functions in replication/logical/logical.c */ > extern void OutputPluginPrepareWrite(struct LogicalDecodingContext > *ctx, bool last_write); > extern void OutputPluginWrite(struct LogicalDecodingContext *ctx, > bool last_write); > -extern void OutputPluginUpdateProgress(struct LogicalDecodingContext *ctx); > +extern void OutputPluginUpdateProgress(struct LogicalDecodingContext > *ctx, bool send_keep_alive); > > => > Change "send_keep_alive" --> "send_keepalive" > > -- > > 2. Comment should be capitalized - src/backend/replication/walsender.c > > @@ -170,6 +170,9 @@ static TimestampTz last_reply_timestamp = 0; > /* Have we sent a heartbeat message asking for reply, since last reply? */ > static bool waiting_for_ping_response = false; > > +/* force keep alive when skipping transactions in synchronous > replication mode */ > +static bool force_keepalive_syncrep = false; > > => > "force" --> "Force" > > -- > > Otherwise, v14-0001 LGTM. > Thanks for the comments. Addressed them in the attached patch. regards, Ajin Cherian Fujitsu Australia v15-0001-Skip-empty-transactions-for-logical-replication.patch Description: Binary data
Re: logical replication empty transactions
I reviewed the v14-0001 patch. All my previous comments have been addressed. Apply / build / test was all OK. -- More review comments: 1. Params names in the function declarations should match the rest of the code. 1a. src/include/replication/logical.h @@ -26,7 +26,8 @@ typedef LogicalOutputPluginWriterWrite LogicalOutputPluginWriterPrepareWrite; typedef void (*LogicalOutputPluginWriterUpdateProgress) (struct LogicalDecodingContext *lr, XLogRecPtr Ptr, - TransactionId xid + TransactionId xid, + bool send_keep_alive => Change "send_keep_alive" --> "send_keepalive" ~~ 1b. src/include/replication/output_plugin.h @@ -243,6 +243,6 @@ typedef struct OutputPluginCallbacks /* Functions in replication/logical/logical.c */ extern void OutputPluginPrepareWrite(struct LogicalDecodingContext *ctx, bool last_write); extern void OutputPluginWrite(struct LogicalDecodingContext *ctx, bool last_write); -extern void OutputPluginUpdateProgress(struct LogicalDecodingContext *ctx); +extern void OutputPluginUpdateProgress(struct LogicalDecodingContext *ctx, bool send_keep_alive); => Change "send_keep_alive" --> "send_keepalive" -- 2. Comment should be capitalized - src/backend/replication/walsender.c @@ -170,6 +170,9 @@ static TimestampTz last_reply_timestamp = 0; /* Have we sent a heartbeat message asking for reply, since last reply? */ static bool waiting_for_ping_response = false; +/* force keep alive when skipping transactions in synchronous replication mode */ +static bool force_keepalive_syncrep = false; => "force" --> "Force" -- Otherwise, v14-0001 LGTM. -- Kind Regards, Peter Smith. Fujitsu Australia
Re: logical replication empty transactions
On Mon, Aug 16, 2021 at 4:44 PM Peter Smith wrote: > I have reviewed the v13-0001 patch. > > Apply / build / test was all OK > > Below are my code review comments. > > // > > Comments for v13-0001 > = > > 1. Patch comment > > => > > Probably this comment should include some description for the new > "keepalive" logic as well. Added. > > -- > > 2. src/backend/replication/syncrep.c - new function > > @@ -330,6 +330,18 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool commit) > } > > /* > + * Check if Sync Rep is enabled > + */ > +bool > +SyncRepEnabled(void) > +{ > + if (SyncRepRequested() && ((volatile WalSndCtlData *) > WalSndCtl)->sync_standbys_defined) > + return true; > + else > + return false; > +} > + > > 2a. Function comment => > > Why abbreviations in the comment? Why not say "synchronous > replication" instead of "Sync Rep". > Changed. > ~~ > > 2b. if/else => > > Remove the if/else. e.g. > > return SyncRepRequested() && ((volatile WalSndCtlData *) > WalSndCtl)->sync_standbys_defined; > > ~~ Changed. > > 2c. Call the new function => > > There is some existing similar code in SyncRepWaitForLSN(), e.g. > > if (!SyncRepRequested() || > !((volatile WalSndCtlData *) WalSndCtl)->sync_standbys_defined) > return; > > Now that you have a new function you maybe can call it from here, e.g. > > if (!SyncRepEnabled()) > return; > Updated. > -- > > 3. src/backend/replication/walsender.c - whitespace > > + if (send_keep_alive) > + force_keep_alive_syncrep = true; > + > + > > => > > Extra blank line? Removed. > > -- > > 4. src/backend/replication/walsender.c - call keepalive > > if (MyWalSnd->flush < sentPtr && > MyWalSnd->write < sentPtr && > !waiting_for_ping_response) > + { > WalSndKeepalive(false); > + } > + else > + { > + if (force_keep_alive_syncrep && SyncRepEnabled()) > + WalSndKeepalive(false); > + } > > > 4a. Move the SynRepEnabled() call => > > I think it is not necessary to call the SynRepEnabled() here. Instead, > it might be better if this is called back when you assign the > force_keep_alive_syncrep flag. So change the WalSndUpdateProgress, > e.g. > > BEFORE > if (send_keep_alive) > force_keep_alive_syncrep = true; > AFTER > force_keep_alive_syncrep = send_keep_alive && SyncRepEnabled(); > > Note: Also, that assignment also deserves a big comment to say what it is > doing. > > ~~ changed. > > 4b. Change the if/else => > > If you make the change for 4a. then perhaps the keepalive if/else is > overkill and could be changed.e.g. > > if (force_keep_alive_syncrep || > MyWalSnd->flush < sentPtr && > MyWalSnd->write < sentPtr && > !waiting_for_ping_response) > WalSndKeepalive(false); > Changed. regards, Ajin Cherian Fujitsu Australia v14-0001-Skip-empty-transactions-for-logical-replication.patch Description: Binary data
Re: logical replication empty transactions
On Fri, Aug 13, 2021 at 9:01 PM Ajin Cherian wrote: > > On Mon, Aug 2, 2021 at 7:20 PM Amit Kapila wrote: > > > > On Fri, Jul 23, 2021 at 3:39 PM Ajin Cherian wrote: > > > > > > > Let's first split the patch for prepared and non-prepared cases as > > that will help to focus on each of them separately. BTW, why haven't > > you considered implementing point 1b as explained by Andres in his > > email [1]? I think we can send a keepalive message in case of > > synchronous replication when we skip an empty transaction, otherwise, > > it might delay in responding to transactions synchronous_commit mode. > > I think in the tests done in the thread, it might not have been shown > > because we are already sending keepalives too frequently. But what if > > someone disables wal_sender_timeout or kept it to a very large value? > > See WalSndKeepaliveIfNecessary. The other thing you might want to look > > at is if the reason for frequent keepalives is the same as described > > in the email [2]. > > > > I have tried to address the comment here by modifying the > ctx->update_progress callback function (WalSndUpdateProgress) provided > for plugins. I have added an option > by which the callback can specify if it wants to send keep_alives. And > when the callback is called with that option set, walsender updates a > flag force_keep_alive_syncrep. > The Walsender in the WalSndWaitForWal for loop, checks this flag and > if synchronous replication is enabled, then sends a keep alive. > Currently this logic > is added as an else to the current logic that is already there in > WalSndWaitForWal, which is probably considered unnecessary and a > source of the keep alive flood > that you talked about. So, I can change that according to how that fix > shapes up there. I have also added an extern function in syncrep.c > that makes it possible > for walsender to query if synchronous replication is turned on. > > The reason I had to turn on a flag and rely on the WalSndWaitForWal to > send the keep alive in its next iteration is because I tried doing > this directly when a > commit is skipped but it didn't work. The reason for this is that when > the commit is being decoded the sentptr at the moment is at the commit > LSN and the keep alive > will be sent for the commit LSN but the syncrep wait is waiting for > end_lsn of the transaction which is the next LSN. So, sending a keep > alive at the moment the > commit is decoded doesn't seem to solve the problem of the waiting > synchronous reply. > > > Few other miscellaneous comments: > > 1. > > static void > > pgoutput_commit_prepared_txn(LogicalDecodingContext *ctx, > > ReorderBufferTXN *txn, > > - XLogRecPtr commit_lsn) > > + XLogRecPtr commit_lsn, XLogRecPtr prepare_end_lsn, > > + TimestampTz prepare_time) > > { > > + PGOutputTxnData*txndata = (PGOutputTxnData *) > > txn->output_plugin_private; > > + > > OutputPluginUpdateProgress(ctx); > > > > + /* > > + * If the BEGIN PREPARE was not yet sent, then it means there were no > > + * relevant changes encountered, so we can skip the COMMIT PREPARED > > + * message too. > > + */ > > + if (txndata) > > + { > > + bool skip = !txndata->sent_begin_txn; > > + pfree(txndata); > > + txn->output_plugin_private = NULL; > > > > How is this supposed to work after the restart when prepared is sent > > before the restart and we are just sending commit_prepared after > > restart? Won't this lead to sending commit_prepared even when the > > corresponding prepare is not sent? Can we think of a better way to > > deal with this? > > > > I have tried to resolve this by adding logic in worker,c to silently > ignore spurious commit_prepareds. But this change required checking if > the prepare exists on the > subscriber before attempting the commit_prepared but the current API > that checks this requires prepare time and transaction end_lsn. But > for this I had to > change the protocol of commit_prepared, and I understand that this > would break backward compatibility between subscriber and publisher > (you have raised this issue as well). > I am not sure how else to handle this, let me know if you have any > other ideas. One option could be to have another API to check if the > prepare exists on the subscriber with > the prepared 'gid' alone, without checking prepare_time or end_lsn. > Let me know if this idea works. > > I have left out the patch 0002 for prepared transactions until we > arrive at a decision on how to address the above issue. > > Peter, > I have also addressed the comments you've raised on patch 0001, please > have a look and confirm. I have reviewed the v13-0001 patch. Apply / build / test was all OK Below are my code review comments. // Comments for v13-0001 = 1. Patch comment => Probably this comment should include some description for the new "keepalive" logic as well. -- 2. src/backend/replication/syncrep.c - new function @@ -330,6 +330,18 @@ SyncRepWaitForLSN(XLogRecPtr lsn, bool
Re: logical replication empty transactions
On Mon, Aug 2, 2021 at 7:20 PM Amit Kapila wrote: > > On Fri, Jul 23, 2021 at 3:39 PM Ajin Cherian wrote: > > > > Let's first split the patch for prepared and non-prepared cases as > that will help to focus on each of them separately. BTW, why haven't > you considered implementing point 1b as explained by Andres in his > email [1]? I think we can send a keepalive message in case of > synchronous replication when we skip an empty transaction, otherwise, > it might delay in responding to transactions synchronous_commit mode. > I think in the tests done in the thread, it might not have been shown > because we are already sending keepalives too frequently. But what if > someone disables wal_sender_timeout or kept it to a very large value? > See WalSndKeepaliveIfNecessary. The other thing you might want to look > at is if the reason for frequent keepalives is the same as described > in the email [2]. > I have tried to address the comment here by modifying the ctx->update_progress callback function (WalSndUpdateProgress) provided for plugins. I have added an option by which the callback can specify if it wants to send keep_alives. And when the callback is called with that option set, walsender updates a flag force_keep_alive_syncrep. The Walsender in the WalSndWaitForWal for loop, checks this flag and if synchronous replication is enabled, then sends a keep alive. Currently this logic is added as an else to the current logic that is already there in WalSndWaitForWal, which is probably considered unnecessary and a source of the keep alive flood that you talked about. So, I can change that according to how that fix shapes up there. I have also added an extern function in syncrep.c that makes it possible for walsender to query if synchronous replication is turned on. The reason I had to turn on a flag and rely on the WalSndWaitForWal to send the keep alive in its next iteration is because I tried doing this directly when a commit is skipped but it didn't work. The reason for this is that when the commit is being decoded the sentptr at the moment is at the commit LSN and the keep alive will be sent for the commit LSN but the syncrep wait is waiting for end_lsn of the transaction which is the next LSN. So, sending a keep alive at the moment the commit is decoded doesn't seem to solve the problem of the waiting synchronous reply. > Few other miscellaneous comments: > 1. > static void > pgoutput_commit_prepared_txn(LogicalDecodingContext *ctx, > ReorderBufferTXN *txn, > - XLogRecPtr commit_lsn) > + XLogRecPtr commit_lsn, XLogRecPtr prepare_end_lsn, > + TimestampTz prepare_time) > { > + PGOutputTxnData*txndata = (PGOutputTxnData *) > txn->output_plugin_private; > + > OutputPluginUpdateProgress(ctx); > > + /* > + * If the BEGIN PREPARE was not yet sent, then it means there were no > + * relevant changes encountered, so we can skip the COMMIT PREPARED > + * message too. > + */ > + if (txndata) > + { > + bool skip = !txndata->sent_begin_txn; > + pfree(txndata); > + txn->output_plugin_private = NULL; > > How is this supposed to work after the restart when prepared is sent > before the restart and we are just sending commit_prepared after > restart? Won't this lead to sending commit_prepared even when the > corresponding prepare is not sent? Can we think of a better way to > deal with this? > I have tried to resolve this by adding logic in worker,c to silently ignore spurious commit_prepareds. But this change required checking if the prepare exists on the subscriber before attempting the commit_prepared but the current API that checks this requires prepare time and transaction end_lsn. But for this I had to change the protocol of commit_prepared, and I understand that this would break backward compatibility between subscriber and publisher (you have raised this issue as well). I am not sure how else to handle this, let me know if you have any other ideas. One option could be to have another API to check if the prepare exists on the subscriber with the prepared 'gid' alone, without checking prepare_time or end_lsn. Let me know if this idea works. I have left out the patch 0002 for prepared transactions until we arrive at a decision on how to address the above issue. Peter, I have also addressed the comments you've raised on patch 0001, please have a look and confirm. Regards, Ajin Cherian Fujitsu Australia. v13-0001-Skip-empty-transactions-for-logical-replication.patch Description: Binary data
Re: logical replication empty transactions
On Sat, Aug 7, 2021 at 12:01 AM Ajin Cherian wrote: > > On Mon, Aug 2, 2021 at 7:20 PM Amit Kapila wrote: > > > > On Fri, Jul 23, 2021 at 3:39 PM Ajin Cherian wrote: > > > > > > > Let's first split the patch for prepared and non-prepared cases as > > that will help to focus on each of them separately. > > As a first shot, I have split the patch into prepared and non-prepared cases, I have reviewed the v12* split patch set. Apply / build / test was all OK Below are my code review comments (mostly cosmetic). // Comments for v12-0001 = 1. Patch comment => This comment as-is might have been OK before the 2PC code was committed, but now that the 2PC is part of the HEAD perhaps this comment needs to be expanded just to say this patch is ONLY for fixing empty transactions for the cases of non-"streaming" and non-"two_phase", and the other kinds will be tackled separately. -- 2. src/backend/replication/pgoutput/pgoutput.c - PGOutputTxnData comment +/* + * Maintain a per-transaction level variable to track whether the + * transaction has sent BEGIN or BEGIN PREPARE. BEGIN or BEGIN PREPARE + * is only sent when the first change in a transaction is processed. + * This makes it possible to skip transactions that are empty. + */ => Maybe this is true for the combined v12-0001/v12-0002 case but just for the v12-0001 patch I think it is nor right to imply that some skipping of the BEGIN_PREPARE is possible, because IIUC it isn;t implemented in the *this* patch/ -- 3. src/backend/replication/pgoutput/pgoutput.c - pgoutput_begin_txn whitespace + PGOutputTxnData*txndata = MemoryContextAllocZero(ctx->context, + sizeof(PGOutputTxnData)); => Misaligned indentation? -- 4. src/backend/replication/pgoutput/pgoutput.c - pgoutput_change brackets + /* + * Output BEGIN if we haven't yet, unless streaming. + */ + if (!in_streaming && !txndata->sent_begin_txn) + { + pgoutput_begin(ctx, txn); + } => The brackets are not needed for the if with a single statement. -- 5. src/backend/replication/pgoutput/pgoutput.c - pgoutput_truncate brackets/comment + /* + * output BEGIN if we haven't yet, + * while streaming no need to send BEGIN / BEGIN PREPARE. + */ + if (!in_streaming && !txndata->sent_begin_txn) + { + pgoutput_begin(ctx, txn); + } 5a. => Same as review comment 4. The brackets are not needed for the if with a single statement. 5b. => Notice this code is the same as cited in review comment 4. So probably the code comment should be consistent/same also? -- 6. src/backend/replication/pgoutput/pgoutput.c - pgoutput_message brackets + Assert(txndata); + if (!txndata->sent_begin_txn) + { + pgoutput_begin(ctx, txn); + } => The brackets are not needed for the if with a single statement. -- 7. typdefs.list => The structure PGOutputTxnData was added in v12-0001, so the typedefs.list probably should also be updated. // Comments for v12-0002 = 8. Patch comment This patch addresses the above problem by postponing the BEGIN / BEGIN PREPARE messages until the first change is encountered. If (when processing a COMMIT / PREPARE message) we find there had been no other change for that transaction, then do not send the COMMIT / PREPARE message. This means that pgoutput will skip BEGIN / COMMIT or BEGIN PREPARE / PREPARE messages for transactions that are empty. pgoutput will also skip COMMIT PREPARED and ROLLBACK PREPARED messages for transactions that are empty. 8a. => I’m not sure this comment is 100% correct for this specific patch. The whole BEGIN/COMMIT was already handled by the v12-0001 patch, right? So really this comment should only be mentioning about BEGIN PREPARE and COMMIT PREPARED I thought. 8b. => I think there should also be some mention that this patch is not handling the "streaming" case of empty tx at all. -- 9. src/backend/replication/logical/proto.c - protocol version @@ -248,8 +250,10 @@ logicalrep_write_commit_prepared(StringInfo out, ReorderBufferTXN *txn, pq_sendbyte(out, flags); /* send fields */ + pq_sendint64(out, prepare_end_lsn); pq_sendint64(out, commit_lsn); pq_sendint64(out, txn->end_lsn); + pq_sendint64(out, prepare_time); pq_sendint64(out, txn->xact_time.commit_time); pq_sendint32(out, txn->xid); => I agree with a previous feedback comment from Amit - Probably there is some protocol version requirement/implications here because the message format has been changed in logicalrep_write_commit_prepared and logicalrep_read_commit_prepared. e.g. Does this code need to be cognisant of the version and behave differently accordingly? -- 10. src/backend/replication/pgoutput/pgoutput.c - pgoutput_begin_prepare flag moved? + Assert(txndata); OutputPluginPrepareWrite(ctx, !send_replication_origin); logicalrep_write_begin_prepare(ctx->out, txn); + txndata->sent_begin_txn = true; send_repl_origin(ctx, txn->origin_id, txn->origin_lsn, send_replication_origin);
Re: logical replication empty transactions
On Mon, Aug 2, 2021 at 7:20 PM Amit Kapila wrote: > > On Fri, Jul 23, 2021 at 3:39 PM Ajin Cherian wrote: > > > > Let's first split the patch for prepared and non-prepared cases as > that will help to focus on each of them separately. As a first shot, I have split the patch into prepared and non-prepared cases, regards, Ajin Cherian Fujitsu Australia v12-0002-Skip-empty-prepared-transactions-for-logical-rep.patch Description: Binary data v12-0001-Skip-empty-transactions-for-logical-replication.patch Description: Binary data
Re: logical replication empty transactions
On Fri, Jul 23, 2021 at 3:39 PM Ajin Cherian wrote: > Let's first split the patch for prepared and non-prepared cases as that will help to focus on each of them separately. BTW, why haven't you considered implementing point 1b as explained by Andres in his email [1]? I think we can send a keepalive message in case of synchronous replication when we skip an empty transaction, otherwise, it might delay in responding to transactions synchronous_commit mode. I think in the tests done in the thread, it might not have been shown because we are already sending keepalives too frequently. But what if someone disables wal_sender_timeout or kept it to a very large value? See WalSndKeepaliveIfNecessary. The other thing you might want to look at is if the reason for frequent keepalives is the same as described in the email [2]. Few other miscellaneous comments: 1. static void pgoutput_commit_prepared_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn, - XLogRecPtr commit_lsn) + XLogRecPtr commit_lsn, XLogRecPtr prepare_end_lsn, + TimestampTz prepare_time) { + PGOutputTxnData*txndata = (PGOutputTxnData *) txn->output_plugin_private; + OutputPluginUpdateProgress(ctx); + /* + * If the BEGIN PREPARE was not yet sent, then it means there were no + * relevant changes encountered, so we can skip the COMMIT PREPARED + * message too. + */ + if (txndata) + { + bool skip = !txndata->sent_begin_txn; + pfree(txndata); + txn->output_plugin_private = NULL; How is this supposed to work after the restart when prepared is sent before the restart and we are just sending commit_prepared after restart? Won't this lead to sending commit_prepared even when the corresponding prepare is not sent? Can we think of a better way to deal with this? 2. @@ -222,8 +224,10 @@ logicalrep_write_commit_prepared(StringInfo out, ReorderBufferTXN *txn, pq_sendbyte(out, flags); /* send fields */ + pq_sendint64(out, prepare_end_lsn); pq_sendint64(out, commit_lsn); pq_sendint64(out, txn->end_lsn); + pq_sendint64(out, prepare_time); Doesn't this means a change of protocol and how is it suppose to work when say publisher is 15 and subscriber from 14 which I think works without such a change? [1] - https://www.postgresql.org/message-id/20200309183018.tzkzwu635sd366ej%40alap3.anarazel.de [2] - https://www.postgresql.org/message-id/CALtH27cip5uQNJb4uHjLXtx1R52ELqXVfcP9fhHr%3DAvFo1dtqw%40mail.gmail.com -- With Regards, Amit Kapila.
RE: logical replication empty transactions
On Friday, July 23, 2021 7:10 PM Ajin Cherian wrote: > On Fri, Jul 23, 2021 at 7:38 PM Peter Smith wrote: > > > > I have reviewed the v10 patch. The patch v11 looks good to me as well. Thanks for addressing my past comments. Best Regards, Takamichi Osumi
Re: logical replication empty transactions
On Fri, Jul 23, 2021 at 8:09 PM Ajin Cherian wrote: > > fixed. The v11 patch LGTM. Regards, Greg Nancarrow Fujitsu Australia
Re: logical replication empty transactions
FYI - I have checked the v11 patch. Everything applies, builds, and tests OK for me, and I have no more review comments. So v11 LGTM. -- Kind Regards, Peter Smith. Fujitsu Australia
Re: logical replication empty transactions
On Fri, Jul 23, 2021 at 7:38 PM Peter Smith wrote: > > I have reviewed the v10 patch. > > Apply / build / test was all OK. > > Just one review comment: > > // > > 1. Typo > > @@ -130,6 +132,17 @@ typedef struct RelationSyncEntry > TupleConversionMap *map; > } RelationSyncEntry; > > +/* > + * Maintain a per-transaction level variable to track whether the > + * transaction has sent BEGIN or BEGIN PREPARE. BEGIN or BEGIN PREPARE > + * is only sent when the first change in a transaction is processed. > + * This make it possible to skip transactions that are empty. > + */ > > => > > typo: "make it possible" --> "makes it possible" > fixed. regards, Ajin Cherian Fujitsu Australia v11-0001-Skip-empty-transactions-for-logical-replication.patch Description: Binary data
Re: logical replication empty transactions
I have reviewed the v10 patch. Apply / build / test was all OK. Just one review comment: // 1. Typo @@ -130,6 +132,17 @@ typedef struct RelationSyncEntry TupleConversionMap *map; } RelationSyncEntry; +/* + * Maintain a per-transaction level variable to track whether the + * transaction has sent BEGIN or BEGIN PREPARE. BEGIN or BEGIN PREPARE + * is only sent when the first change in a transaction is processed. + * This make it possible to skip transactions that are empty. + */ => typo: "make it possible" --> "makes it possible" -- Kind Regards, Peter Smith. Fujitsu Australia
Re: logical replication empty transactions
On Fri, Jul 23, 2021 at 10:13 AM Peter Smith wrote: > > I have reviewed the v9 patch and my feedback comments are below: > > // > > 1. Apply v9 gave multiple whitespace warnings Fixed. > > -- > > 2. Commit comment - wording > > pgoutput will also skip COMMIT PREPARED and ROLLBACK PREPARED messages > for transactions which were skipped. > > => > > Is that correct? Or did you mean to say: > > AFTER > pgoutput will also skip COMMIT PREPARED and ROLLBACK PREPARED messages > for transactions that are empty. > > -- Updated. > > 3. src/backend/replication/pgoutput/pgoutput.c - typo > > + /* > + * If the BEGIN PREPARE was not yet sent, then it means there were no > + * relevant changes encountered, so we can skip the COMMIT PREPARED > + * messsage too. > + */ > > Typo: "messsage" --> "message" > > (NOTE this same typo is in 2 places) > Fixed. I have made these changes in v10 of the patch. regards, Ajin Cherian Fujitsu Australia
Re: logical replication empty transactions
On Fri, Jul 23, 2021 at 10:26 AM Greg Nancarrow wrote: > > On Thu, Jul 22, 2021 at 11:37 PM Ajin Cherian wrote: > > > > I have some minor comments on the v9 patch: > > (1) Several whitespace warnings on patch application > Fixed. > (2) Suggested patch comment change: > > BEFORE: > The current logical replication behaviour is to send every transaction to > subscriber even though the transaction is empty (because it does not > AFTER: > The current logical replication behaviour is to send every transaction to > subscriber even though the transaction might be empty (because it does not > Changed accordingly. > (3) Comment needed for added struct defn: > > typedef struct PGOutputTxnData > Added. > (4) Improve comment. > > Can you add a comma (or add words) in the below sentence, so we know > how to read it? > Updated. regards, Ajin Cherian Fujitsu Australia v10-0001-Skip-empty-transactions-for-logical-replication.patch Description: Binary data
Re: logical replication empty transactions
On Thu, Jul 22, 2021 at 11:37 PM Ajin Cherian wrote: > I have some minor comments on the v9 patch: (1) Several whitespace warnings on patch application (2) Suggested patch comment change: BEFORE: The current logical replication behaviour is to send every transaction to subscriber even though the transaction is empty (because it does not AFTER: The current logical replication behaviour is to send every transaction to subscriber even though the transaction might be empty (because it does not (3) Comment needed for added struct defn: typedef struct PGOutputTxnData (4) Improve comment. Can you add a comma (or add words) in the below sentence, so we know how to read it? + /* + * Delegate to assign the begin sent flag as false same as for the + * BEGIN message. + */ Regards, Greg Nancarrow Fujitsu Australia
Re: logical replication empty transactions
I have reviewed the v9 patch and my feedback comments are below: // 1. Apply v9 gave multiple whitespace warnings $ git apply v9-0001-Skip-empty-transactions-for-logical-replication.patch v9-0001-Skip-empty-transactions-for-logical-replication.patch:479: indent with spaces. * If the BEGIN PREPARE was not yet sent, then it means there were no v9-0001-Skip-empty-transactions-for-logical-replication.patch:480: indent with spaces. * relevant changes encountered, so we can skip the ROLLBACK PREPARED v9-0001-Skip-empty-transactions-for-logical-replication.patch:481: indent with spaces. * messsage too. v9-0001-Skip-empty-transactions-for-logical-replication.patch:482: indent with spaces. */ warning: 4 lines add whitespace errors. -- 2. Commit comment - wording pgoutput will also skip COMMIT PREPARED and ROLLBACK PREPARED messages for transactions which were skipped. => Is that correct? Or did you mean to say: AFTER pgoutput will also skip COMMIT PREPARED and ROLLBACK PREPARED messages for transactions that are empty. -- 3. src/backend/replication/pgoutput/pgoutput.c - typo + /* + * If the BEGIN PREPARE was not yet sent, then it means there were no + * relevant changes encountered, so we can skip the COMMIT PREPARED + * messsage too. + */ Typo: "messsage" --> "message" (NOTE this same typo is in 2 places) -- Kind Regards, Peter Smith. Fujitsu Australia
Re: logical replication empty transactions
On Thu, Jul 22, 2021 at 6:11 PM Peter Smith wrote: > > Hi Ajin. > > I have reviewed the v8 patch and my feedback comments are below: > > // > > 1. Apply v8 gave multiple whitespace warnings. > > -- > > 2. Commit comment - wording > > If (when processing a COMMIT / PREPARE message) we find there had been > no other change for that transaction, then do not send the COMMIT / > PREPARE message. This means that pgoutput will skip BEGIN / COMMIT > or BEGIN PREPARE / PREPARE messages for transactions that are empty. > > => > > Shouldn't this also mention some other messages that may be skipped? > - COMMIT PREPARED > - ROLLBACK PREPARED > Updated. > -- > > 3. doc/src/sgml/logicaldecoding.sgml - wording > > @@ -884,11 +884,20 @@ typedef void (*LogicalDecodePrepareCB) (struct > LogicalDecodingContext *ctx, >The required commit_prepared_cb callback is called >whenever a transaction COMMIT PREPARED has > been decoded. >The gid field, which is part of the > - txn parameter, can be used in this callback. > + txn parameter, can be used in this callback. The > + parameters prepare_end_lsn and > + prepare_time can be used to check if the plugin > + has received this PREPARE TRANSACTION command or > not. > + If yes, it can commit the transaction, otherwise, it can skip the > commit. > + The gid alone is not sufficient to determine > this > + because the downstream may already have a prepared transaction with the > + same identifier. > > => > > Typo: Should that say "downstream node" instead of just "downstream" ? > > -- Updated. > > 4. src/backend/replication/pgoutput/pgoutput.c - pgoutput_begin_txn > callback comment > > @@ -406,14 +413,38 @@ pgoutput_startup(LogicalDecodingContext *ctx, > OutputPluginOptions *opt, > > /* > * BEGIN callback > + * Don't send BEGIN message here. Instead, postpone it until the first > + * change. In logical replication, a common scenario is to replicate a set > + * of tables (instead of all tables) and transactions whose changes were on > > => > > Typo: "BEGIN callback" --> "BEGIN callback." (with the period). > > And, I think maybe it will be better if it has a separating blank line too. > > e.g. > > /* > * BEGIN callback. > * > * Don't send BEGIN > > (NOTE: this review comment applies to other callback function comments > too, so please hunt them all down) > > -- Updated. > > 5. src/backend/replication/pgoutput/pgoutput.c - data / txndata > > static void > pgoutput_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn) > { > + PGOutputTxnData*data = MemoryContextAllocZero(ctx->context, > + sizeof(PGOutputTxnData)); > > => > > There is some inconsistent naming of the local variable in the patch. > Sometimes it is called "data"; Sometimes it is called "txdata" etc. It > would be better to just stick with the same variable name everywhere. > > (NOTE: this comment applies to several places in this patch) > > -- I've changed all occurance of PGOutputTxnData to txndata. Note that there is another structure PGOutputData which still uses the name data. > > 6. src/backend/replication/pgoutput/pgoutput.c - Strange way to use Assert > > + /* If not streaming, should have setup txndata as part of > BEGIN/BEGIN PREPARE */ > + if (!in_streaming) > + Assert(txndata); > + > > => > > This style of Assert code seemed strange to me. In production mode > isn't that going to evaluate to some condition with a ((void) true) > body? IMO it might be better to just include the streaming check as > part of the Assert. For example: > > BEFORE > if (!in_streaming) > Assert(txndata); > > AFTER > Assert(in_streaming || txndata); > > (NOTE: This same review comment applies in at least 3 places in this > patch, so please hunt them all down) > Updated. > -- > > 7. src/backend/replication/pgoutput/pgoutput.c - comment wording > > @@ -677,6 +810,18 @@ pgoutput_change(LogicalDecodingContext *ctx, > ReorderBufferTXN *txn, > Assert(false); > } > > + /* > + * output BEGIN / BEGIN PREPARE if we haven't yet, > + * while streaming no need to send BEGIN / BEGIN PREPARE. > + */ > + if (!in_streaming && !txndata->sent_begin_txn) > > => > > English not really that comment is. The comment should also start with > uppercase. > > (NOTE: This same comment was in couple of places in the patch) > Updated. regards, Ajin Cherian Fujitsu Australia v9-0001-Skip-empty-transactions-for-logical-replication.patch Description: Binary data
Re: logical replication empty transactions
Hi Ajin. I have reviewed the v8 patch and my feedback comments are below: // 1. Apply v8 gave multiple whitespace warnings. -- 2. Commit comment - wording If (when processing a COMMIT / PREPARE message) we find there had been no other change for that transaction, then do not send the COMMIT / PREPARE message. This means that pgoutput will skip BEGIN / COMMIT or BEGIN PREPARE / PREPARE messages for transactions that are empty. => Shouldn't this also mention some other messages that may be skipped? - COMMIT PREPARED - ROLLBACK PREPARED -- 3. doc/src/sgml/logicaldecoding.sgml - wording @@ -884,11 +884,20 @@ typedef void (*LogicalDecodePrepareCB) (struct LogicalDecodingContext *ctx, The required commit_prepared_cb callback is called whenever a transaction COMMIT PREPARED has been decoded. The gid field, which is part of the - txn parameter, can be used in this callback. + txn parameter, can be used in this callback. The + parameters prepare_end_lsn and + prepare_time can be used to check if the plugin + has received this PREPARE TRANSACTION command or not. + If yes, it can commit the transaction, otherwise, it can skip the commit. + The gid alone is not sufficient to determine this + because the downstream may already have a prepared transaction with the + same identifier. => Typo: Should that say "downstream node" instead of just "downstream" ? -- 4. src/backend/replication/pgoutput/pgoutput.c - pgoutput_begin_txn callback comment @@ -406,14 +413,38 @@ pgoutput_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt, /* * BEGIN callback + * Don't send BEGIN message here. Instead, postpone it until the first + * change. In logical replication, a common scenario is to replicate a set + * of tables (instead of all tables) and transactions whose changes were on => Typo: "BEGIN callback" --> "BEGIN callback." (with the period). And, I think maybe it will be better if it has a separating blank line too. e.g. /* * BEGIN callback. * * Don't send BEGIN (NOTE: this review comment applies to other callback function comments too, so please hunt them all down) -- 5. src/backend/replication/pgoutput/pgoutput.c - data / txndata static void pgoutput_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn) { + PGOutputTxnData*data = MemoryContextAllocZero(ctx->context, + sizeof(PGOutputTxnData)); => There is some inconsistent naming of the local variable in the patch. Sometimes it is called "data"; Sometimes it is called "txdata" etc. It would be better to just stick with the same variable name everywhere. (NOTE: this comment applies to several places in this patch) -- 6. src/backend/replication/pgoutput/pgoutput.c - Strange way to use Assert + /* If not streaming, should have setup txndata as part of BEGIN/BEGIN PREPARE */ + if (!in_streaming) + Assert(txndata); + => This style of Assert code seemed strange to me. In production mode isn't that going to evaluate to some condition with a ((void) true) body? IMO it might be better to just include the streaming check as part of the Assert. For example: BEFORE if (!in_streaming) Assert(txndata); AFTER Assert(in_streaming || txndata); (NOTE: This same review comment applies in at least 3 places in this patch, so please hunt them all down) -- 7. src/backend/replication/pgoutput/pgoutput.c - comment wording @@ -677,6 +810,18 @@ pgoutput_change(LogicalDecodingContext *ctx, ReorderBufferTXN *txn, Assert(false); } + /* + * output BEGIN / BEGIN PREPARE if we haven't yet, + * while streaming no need to send BEGIN / BEGIN PREPARE. + */ + if (!in_streaming && !txndata->sent_begin_txn) => English not really that comment is. The comment should also start with uppercase. (NOTE: This same comment was in couple of places in the patch) -- Kind Regards, Peter Smith. Fujitsu Australia
Re: logical replication empty transactions
On Thu, Jul 15, 2021 at 3:50 PM osumi.takami...@fujitsu.com wrote: > I started to test this patch but will give you some really minor quick > feedbacks. > > (1) pg_logical_slot_get_binary_changes() params. > > Technically, looks better to have proto_version 3 & two_phase option for the > function > to test empty prepare ? I felt proto_version 1 doesn't support 2PC. > [1] says "The following messages (Begin Prepare, Prepare, Commit Prepared, > Rollback Prepared) > are available since protocol version 3." Then, if the test wants to skip > empty *prepares*, > I suggest to update the proto_version and set two_phase 'on'. Updated accordingly. > (2) The following sentences may start with a lowercase letter. > There are other similar codes for this. > > + elog(DEBUG1, "Skipping replication of an empty transaction"); Fixed this. I've addressed these comments in version 8 of the patch. regards, Ajin Cherian Fujitsu Australia
Re: logical replication empty transactions
On Mon, Jul 19, 2021 at 3:24 PM Peter Smith wrote: > 1a. Commit Comment - wording > updated. > > 1b. Commit Comment - wording > updated. > 2. doc/src/sgml/logicaldecoding.sgml - wording > > @@ -884,11 +884,19 @@ typedef void (*LogicalDecodePrepareCB) (struct > LogicalDecodingContext *ctx, >The required commit_prepared_cb callback is called >whenever a transaction COMMIT PREPARED has > been decoded. >The gid field, which is part of the > - txn parameter, can be used in this callback. > + txn parameter, can be used in this callback. The > + parameters prepare_end_lsn and > + prepare_time can be used to check if the plugin > + has received this PREPARE TRANSACTION in which case > + it can commit the transaction, otherwise, it can skip the commit. The > + gid alone is not sufficient because the > downstream > + node can have a prepared transaction with the same identifier. > > => > > (some minor rewording of the last part) updated. > > 3. src/backend/replication/logical/proto.c - whitespace > > @@ -244,12 +248,16 @@ logicalrep_read_commit_prepared(StringInfo in, > LogicalRepCommitPreparedTxnData * > elog(ERROR, "unrecognized flags %u in commit prepared message", flags); > > /* read fields */ > + prepare_data->prepare_end_lsn = pq_getmsgint64(in); > + if (prepare_data->prepare_end_lsn == InvalidXLogRecPtr) > + elog(ERROR,"prepare_end_lsn is not set in commit prepared message"); > > => > > There is missing space before the 2nd elog param. > fixed. > > 4a. => > > "and was essentially an empty prepare" --> "so was essentially an empty > prepare" > > 4b. => > > "In which case" --> "In this case" > > -- fixed. > I felt that since this message postponement is now the new behaviour > of this function then probably this should all be a function level > comment instead of the comment being in the body of the function > > -- > > 6. src/backend/replication/pgoutput/pgoutput.c - pgoutput_begin > > + > +static void > +pgoutput_begin(LogicalDecodingContext *ctx, ReorderBufferTXN *txn) > > => > > Even though it is kind of obvious, it is probably better to provide a > function comment here too > > -- Changed accordingly. > > I felt that the comment "skip COMMIT message if nothing was sent" > should be done at the point where you *decide* to skip or not. So you > could either move that comment to where the skip variable is assigned. > Or (my preference) leave the comment where it is but change the > variable name to be sent_begin = !data->sent_begin_txn; > Updated the comment to where the skip variable is assigned. > -- > > Regardless I think the comment should be elaborated a bit to describe > the reason more. > > 7b. => > > BEFORE > /* skip COMMIT message if nothing was sent */ > > AFTER > /* If a BEGIN message was not yet sent, then it means there were no > relevant changes encountered, so we can skip the COMMIT message too. > */ > Updated accordingly. > -- > Like previously, I felt that this big comment should be at the > function level of pgoutput_begin_prepare_txn instead of in the body of > the function. > > -- > > 8b. => > > And then the body comment would be something simple like: > > /* Delegate to assign the begin sent flag as false same as for the > BEGIN message. */ > pgoutput_begin_txn(ctx, txn); > Updated accordingly. > -- > > 9. src/backend/replication/pgoutput/pgoutput.c - pgoutput_begin_prepare > > + > +static void > +pgoutput_begin_prepare(LogicalDecodingContext *ctx, ReorderBufferTXN *txn) > > => > > Probably this needs a function comment. > Updated. > -- > > 10. src/backend/replication/pgoutput/pgoutput.c - pgoutput_prepare_txn > > @@ -459,8 +520,18 @@ static void > pgoutput_prepare_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn, > XLogRecPtr prepare_lsn) > { > + PGOutputTxnData*data = (PGOutputTxnData *) txn->output_plugin_private; > + > + Assert(data); > OutputPluginUpdateProgress(ctx); > > + /* skip PREPARE message if nothing was sent */ > + if (!data->sent_begin_txn) > > => > > Maybe elaborate on that "skip PREPARE message if nothing was sent" > comment in a way similar to my review comment 7b. For example, > > AFTER > /* If the BEGIN was not yet sent, then it means there were no relevant > changes encountered, so we can skip the PREPARE message too. */ > Updated. > -- > > 11. src/backend/replication/pgoutput/pgoutput.c - pgoutput_commit_prepared_txn > > @@ -471,12 +542,33 @@ pgoutput_prepare_txn(LogicalDecodingContext > *ctx, ReorderBufferTXN *txn, > */ > static void > pgoutput_commit_prepared_txn(LogicalDecodingContext *ctx, > ReorderBufferTXN *txn, > - XLogRecPtr commit_lsn) > + XLogRecPtr commit_lsn, XLogRecPtr prepare_end_lsn, > + TimestampTz prepare_time) > { > + PGOutputTxnData*data = (PGOutputTxnData *) txn->output_plugin_private; > + > OutputPluginUpdateProgress(ctx); > > + /* > + * skip sending COMMIT PREPARED message if prepared
Re: logical replication empty transactions
Hi Ajin, I have reviewed the v7 patch and given my feedback comments below. Apply OK Build OK make check OK TAP (subscriptions) make check OK Build PG Docs (html) OK Although I made lots of review comments below, the important point is that none of them are functional - they are only minore re-wordings and some code refactoring that I thought would make the code simpler and/or easier to read. YMMV, so please feel free to disagree with any of them. // 1a. Commit Comment - wording BEFORE This patch addresses the above problem by postponing the BEGIN / BEGIN PREPARE message until the first change. AFTER This patch addresses the above problem by postponing the BEGIN / BEGIN PREPARE messages until the first change is encountered. -- 1b. Commit Comment - wording BEFORE While processing a COMMIT message or a PREPARE message, if there is no other change for that transaction, do not send COMMIT message or PREPARE message. AFTER If (when processing a COMMIT / PREPARE message) we find there had been no other change for that transaction, then do not send the COMMIT / PREPARE message. -- 2. doc/src/sgml/logicaldecoding.sgml - wording @@ -884,11 +884,19 @@ typedef void (*LogicalDecodePrepareCB) (struct LogicalDecodingContext *ctx, The required commit_prepared_cb callback is called whenever a transaction COMMIT PREPARED has been decoded. The gid field, which is part of the - txn parameter, can be used in this callback. + txn parameter, can be used in this callback. The + parameters prepare_end_lsn and + prepare_time can be used to check if the plugin + has received this PREPARE TRANSACTION in which case + it can commit the transaction, otherwise, it can skip the commit. The + gid alone is not sufficient because the downstream + node can have a prepared transaction with the same identifier. => (some minor rewording of the last part) AFTER: The parameters prepare_end_lsn and prepare_time can be used to check if the plugin has received this PREPARE TRANSACTION or not. If yes, it can commit the transaction, otherwise, it can skip the commit. The gid alone is not sufficient to determine this because the downstream node may already have a prepared transaction with the same identifier. -- 3. src/backend/replication/logical/proto.c - whitespace @@ -244,12 +248,16 @@ logicalrep_read_commit_prepared(StringInfo in, LogicalRepCommitPreparedTxnData * elog(ERROR, "unrecognized flags %u in commit prepared message", flags); /* read fields */ + prepare_data->prepare_end_lsn = pq_getmsgint64(in); + if (prepare_data->prepare_end_lsn == InvalidXLogRecPtr) + elog(ERROR,"prepare_end_lsn is not set in commit prepared message"); => There is missing space before the 2nd elog param. -- 4. src/backend/replication/logical/worker.c - comment typos /* - * Update origin state so we can restart streaming from correct position - * in case of crash. + * It is possible that we haven't received the prepare because + * the transaction did not have any changes relevant to this + * subscription and was essentially an empty prepare. In which case, + * the walsender is optimized to drop the empty transaction and the + * accompanying prepare. Silently ignore if we don't find the prepared + * transaction. */ 4a. => "and was essentially an empty prepare" --> "so was essentially an empty prepare" 4b. => "In which case" --> "In this case" -- 5. src/backend/replication/pgoutput/pgoutput.c - pgoutput_begin_txn @@ -410,10 +417,32 @@ pgoutput_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt, static void pgoutput_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn) { + PGOutputTxnData*data = MemoryContextAllocZero(ctx->context, + sizeof(PGOutputTxnData)); + + /* + * Don't send BEGIN message here. Instead, postpone it until the first + * change. In logical replication, a common scenario is to replicate a set + * of tables (instead of all tables) and transactions whose changes were on + * table(s) that are not published will produce empty transactions. These + * empty transactions will send BEGIN and COMMIT messages to subscribers, + * using bandwidth on something with little/no use for logical replication. + */ + data->sent_begin_txn = false; + txn->output_plugin_private = data; +} => I felt that since this message postponement is now the new behaviour of this function then probably this should all be a function level comment instead of the comment being in the body of the function -- 6. src/backend/replication/pgoutput/pgoutput.c - pgoutput_begin + +static void +pgoutput_begin(LogicalDecodingContext *ctx, ReorderBufferTXN *txn) => Even though it is kind of obvious, it is probably better to provide a function comment here too -- 7. src/backend/replication/pgoutput/pgoutput.c - pgoutput_commit_txn @@ -428,8 +457,22 @@ static void pgoutput_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN
RE: logical replication empty transactions
On Wednesday, July 14, 2021 9:30 PM Ajin Cherian wrote: > I've had to rebase the patch after a recent commit by Amit Kapila of > supporting > two-phase commits in pub-sub [1]. > Also I've modified the patch to also skip replicating empty prepared > transactions. Do let me know if you have any comments. Hi I started to test this patch but will give you some really minor quick feedbacks. (1) pg_logical_slot_get_binary_changes() params. Technically, looks better to have proto_version 3 & two_phase option for the function to test empty prepare ? I felt proto_version 1 doesn't support 2PC. [1] says "The following messages (Begin Prepare, Prepare, Commit Prepared, Rollback Prepared) are available since protocol version 3." Then, if the test wants to skip empty *prepares*, I suggest to update the proto_version and set two_phase 'on'. +## +# Test empty prepares +## ... +# peek at the contents of the slot +$result = $node_publisher->safe_psql( + 'postgres', qq( + SELECT get_byte(data, 0) + FROM pg_logical_slot_get_binary_changes('tap_sub', NULL, NULL, + 'proto_version', '1', + 'publication_names', 'tap_pub') +)); (2) The following sentences may start with a lowercase letter. There are other similar codes for this. + elog(DEBUG1, "Skipping replication of an empty transaction"); [1] - https://www.postgresql.org/docs/devel/protocol-logicalrep-message-formats.html Best Regards, Takamichi Osumi
Re: logical replication empty transactions
On Thu, May 27, 2021 at 8:58 PM vignesh C wrote: > Thanks for the updated patch, few comments: > 1) I'm not sure if we could add some tests for skip empty > transactions, if possible add a few tests. > Added a few tests for prepared transactions as well as the existing test in 020_messages.pl also tests regular transactions. > 2) We could add some debug level log messages for the transaction that > will be skipped. Added. > > 3) You could keep this variable below the other bool variables in the > structure: > + boolsent_begin_txn; /* flag indicating whether begin > + > * has already been sent */ > + I've moved this variable around, so this comment no longer is valid. > > 4) You can split the comments to multi-line as it exceeds 80 chars > + /* output BEGIN if we haven't yet, avoid for streaming and > non-transactional messages */ > + if (!data->sent_begin_txn && !in_streaming && transactional) > + pgoutput_begin(ctx, txn); Done. I've had to rebase the patch after a recent commit by Amit Kapila of supporting two-phase commits in pub-sub [1]. Also I've modified the patch to also skip replicating empty prepared transactions. Do let me know if you have any comments. regards, Ajin Cherian Fujitsu Australia [1]- https://www.postgresql.org/message-id/CAHut+PueG6u3vwG8DU=JhJiWa2TwmZ=bdqpchzkbky7ykza...@mail.gmail.com v7-0001-Skip-empty-transactions-for-logical-replication.patch Description: Binary data
Re: logical replication empty transactions
On Tue, May 25, 2021 at 6:36 PM Ajin Cherian wrote: > > On Tue, Apr 27, 2021 at 1:49 PM Ajin Cherian wrote: > > Rebased the patch as it was no longer applying. Thanks for the updated patch, few comments: 1) I'm not sure if we could add some tests for skip empty transactions, if possible add a few tests. 2) We could add some debug level log messages for the transaction that will be skipped. 3) You could keep this variable below the other bool variables in the structure: + boolsent_begin_txn; /* flag indicating whether begin + * has already been sent */ + 4) You can split the comments to multi-line as it exceeds 80 chars + /* output BEGIN if we haven't yet, avoid for streaming and non-transactional messages */ + if (!data->sent_begin_txn && !in_streaming && transactional) + pgoutput_begin(ctx, txn); Regards, Vignesh
Re: logical replication empty transactions
On Tue, Apr 27, 2021 at 1:49 PM Ajin Cherian wrote: Rebased the patch as it was no longer applying. regards, Ajin Cherian Fujitsu Australia v6-0001-Skip-empty-transactions-for-logical-replication.patch Description: Binary data
Re: logical replication empty transactions
On Mon, Apr 26, 2021 at 4:29 PM Peter Smith wrote: > The v4 patch applied cleanly. > > make check-world completed successfully. > > So this patch v4 looks LGTM, apart from the following 2 nitpick comments: > > == > > 1. Suggest to add a blank line after the (void)txn; ? > > @@ -345,10 +345,29 @@ pgoutput_startup(LogicalDecodingContext *ctx, > OutputPluginOptions *opt, > static void > pgoutput_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn) > { > + PGOutputData *data = (PGOutputData *) ctx->output_plugin_private; > + > + (void)txn; /* keep compiler quiet */ > + /* > + * Don't send BEGIN message here. Instead, postpone it until the first > > Fixed. > == > > 2. Unnecessary statement blocks? > > AFAIK those { } are not the usual PG code-style when there is only one > statement, so suggest to remove them. > > Appies to 3 places: > > @@ -551,6 +576,12 @@ pgoutput_change(LogicalDecodingContext *ctx, > ReorderBufferTXN *txn, > Assert(false); > } > > + /* output BEGIN if we haven't yet */ > + if (!data->sent_begin_txn && !in_streaming) > + { > + pgoutput_begin(ctx, txn); > + } > > @@ -693,6 +724,12 @@ pgoutput_truncate(LogicalDecodingContext *ctx, > ReorderBufferTXN *txn, > > if (nrelids > 0) > { > + /* output BEGIN if we haven't yet */ > + if (!data->sent_begin_txn && !in_streaming) > + { > + pgoutput_begin(ctx, txn); > + } > > @@ -725,6 +762,12 @@ pgoutput_message(LogicalDecodingContext *ctx, > ReorderBufferTXN *txn, > if (in_streaming) > xid = txn->xid; > > + /* output BEGIN if we haven't yet, avoid for streaming and > non-transactional messages */ > + if (!data->sent_begin_txn && !in_streaming && transactional) > + { > + pgoutput_begin(ctx, txn); > + } > Fixed. regards, Ajin Cherian Fujitsu Australia v5-0001-Skip-empty-transactions-for-logical-replication.patch Description: Binary data
Re: logical replication empty transactions
On Fri, Apr 23, 2021 at 3:46 PM Ajin Cherian wrote: > > > > On Mon, Apr 19, 2021 at 6:22 PM Peter Smith wrote: >> >> >> Here are a some review comments: >> >> -- >> >> 1. The patch v3 applied OK but with whitespace warnings >> >> [postgres@CentOS7-x64 oss_postgres_2PC]$ git apply >> ../patches_misc/v3-0001-Skip-empty-transactions-for-logical-replication.patch >> ../patches_misc/v3-0001-Skip-empty-transactions-for-logical-replication.patch:98: >> indent with spaces. >> /* output BEGIN if we haven't yet, avoid for streaming and >> non-transactional messages */ >> ../patches_misc/v3-0001-Skip-empty-transactions-for-logical-replication.patch:99: >> indent with spaces. >> if (!data->xact_wrote_changes && !in_streaming && transactional) >> warning: 2 lines add whitespace errors. >> >> -- > > > Fixed. > >> >> >> 2. Please create a CF entry in [1] for this patch. >> >> -- >> >> 3. Patch comment >> >> The comment describes the problem and then suddenly just says >> "Postpone the BEGIN message until the first change." >> >> I suggest changing it to say more like... "(blank line) This patch >> addresses the above problem by postponing the BEGIN message until the >> first change." >> >> -- >> > > Updated. > >> >> 4. pgoutput.h >> >> Maybe for consistency with the context member, the comment for the new >> member should be to the right instead of above it? >> >> @@ -20,6 +20,9 @@ typedef struct PGOutputData >> MemoryContext context; /* private memory context for transient >> * allocations */ >> >> + /* flag indicating whether messages have previously been sent */ >> + boolxact_wrote_changes; >> + >> >> -- >> >> 5. pgoutput.h >> >> + /* flag indicating whether messages have previously been sent */ >> >> "previously been sent" --> "already been sent" ?? >> >> -- >> >> 6. pgoutput.h - misleading member name >> >> Actually, now that I have read all the rest of the code and how this >> member is used I feel that this name is very misleading. e.g. For >> "streaming" case then you still are writing changes but are not >> setting this member at all - therefore it does not always mean what it >> says. >> >> I feel a better name for this would be something like >> "sent_begin_txn". Then if you have sent BEGIN it is true. If you >> haven't sent BEGIN it is false. It eliminates all ambiguity naming it >> this way instead. >> >> (This makes my feedback #5 redundant because the comment will be a bit >> different if you do this). >> >> -- > > > Fixed above comments. >> >> >> 7. pgoutput.c - function pgoutput_begin_txn >> >> @@ -345,6 +345,23 @@ pgoutput_startup(LogicalDecodingContext *ctx, >> OutputPluginOptions *opt, >> static void >> pgoutput_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn) >> { >> >> I guess that you still needed to pass the txn because that is how the >> API is documented, right? >> >> But I am wondering if you ought to flag it as unused so you wont get >> some BF machine giving warnings about it. >> >> e.g. Syntax like this? >> >> pgoutput_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN * txn) { >> (void)txn; >> ... > > > Updated. >> >> -- >> >> 8. pgoutput.c - function pgoutput_begin_txn >> >> @@ -345,6 +345,23 @@ pgoutput_startup(LogicalDecodingContext *ctx, >> OutputPluginOptions *opt, >> static void >> pgoutput_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn) >> { >> + PGOutputData *data = ctx->output_plugin_private; >> + >> + /* >> + * Don't send BEGIN message here. Instead, postpone it until the first >> + * change. In logical replication, a common scenario is to replicate a set >> + * of tables (instead of all tables) and transactions whose changes were on >> + * table(s) that are not published will produce empty transactions. These >> + * empty transactions will send BEGIN and COMMIT messages to subscribers, >> + * using bandwidth on something with little/no use for logical replication. >> + */ >> + data->xact_wrote_changes = false; >> + elog(LOG,"Holding of begin"); >> +} >> >> Why is this loglevel LOG? Looks like leftover debugging. > > > Removed. >> >> >> -- >> >> 9. pgoutput.c - function pgoutput_commit_txn >> >> @@ -384,8 +401,14 @@ static void >> pgoutput_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn, >> XLogRecPtr commit_lsn) >> { >> + PGOutputData *data = ctx->output_plugin_private; >> + >> OutputPluginUpdateProgress(ctx); >> >> + /* skip COMMIT message if nothing was sent */ >> + if (!data->xact_wrote_changes) >> + return; >> + >> >> In the case where you decided to do nothing does it make sense that >> you still called the function OutputPluginUpdateProgress(ctx); ? >> I thought perhaps that your new check should come first so this call >> would never happen. > > > Even though the empty transaction is not sent, the LSN is tracked as decoded, > hence the progress needs to be updated. > >> >> -- >> >> 10. pgoutput.c - variable declarations without casts >> >> +
Re: logical replication empty transactions
An earlier comment from Anders: > We could e.g. have a new LogicalDecodingContext callback that is > called whenever WalSndWaitForWal() would wait. That'd check if there's > a pending "need" to send out a 'empty transaction'/feedback request > message. The "need" flag would get cleared whenever we send out data > bearing an LSN for other reasons. > I think the current Keep Alive messages already achieve this by sending the current LSN as part of the Keep Alive messages. /* construct the message... */ resetStringInfo(_message); pq_sendbyte(_message, 'k'); pq_sendint64(_message, sentPtr); <=== Last sent WAL LSN pq_sendint64(_message, GetCurrentTimestamp()); pq_sendbyte(_message, requestReply ? 1 : 0); I'm not sure if anything more is required to keep empty transactions updated as part of synchronous replicas. If my understanding on this is not correct, let me know. regards, Ajin Cherian Fujitsu Australia
Re: logical replication empty transactions
On Mon, Apr 19, 2021 at 6:22 PM Peter Smith wrote: > > Here are a some review comments: > > -- > > 1. The patch v3 applied OK but with whitespace warnings > > [postgres@CentOS7-x64 oss_postgres_2PC]$ git apply > > ../patches_misc/v3-0001-Skip-empty-transactions-for-logical-replication.patch > > ../patches_misc/v3-0001-Skip-empty-transactions-for-logical-replication.patch:98: > indent with spaces. > /* output BEGIN if we haven't yet, avoid for streaming and > non-transactional messages */ > > ../patches_misc/v3-0001-Skip-empty-transactions-for-logical-replication.patch:99: > indent with spaces. > if (!data->xact_wrote_changes && !in_streaming && transactional) > warning: 2 lines add whitespace errors. > > -- > Fixed. > > 2. Please create a CF entry in [1] for this patch. > > -- > > 3. Patch comment > > The comment describes the problem and then suddenly just says > "Postpone the BEGIN message until the first change." > > I suggest changing it to say more like... "(blank line) This patch > addresses the above problem by postponing the BEGIN message until the > first change." > > -- > > Updated. > 4. pgoutput.h > > Maybe for consistency with the context member, the comment for the new > member should be to the right instead of above it? > > @@ -20,6 +20,9 @@ typedef struct PGOutputData > MemoryContext context; /* private memory context for transient > * allocations */ > > + /* flag indicating whether messages have previously been sent */ > + boolxact_wrote_changes; > + > > -- > > 5. pgoutput.h > > + /* flag indicating whether messages have previously been sent */ > > "previously been sent" --> "already been sent" ?? > > -- > > 6. pgoutput.h - misleading member name > > Actually, now that I have read all the rest of the code and how this > member is used I feel that this name is very misleading. e.g. For > "streaming" case then you still are writing changes but are not > setting this member at all - therefore it does not always mean what it > says. > > I feel a better name for this would be something like > "sent_begin_txn". Then if you have sent BEGIN it is true. If you > haven't sent BEGIN it is false. It eliminates all ambiguity naming it > this way instead. > > (This makes my feedback #5 redundant because the comment will be a bit > different if you do this). > > -- > Fixed above comments. > > 7. pgoutput.c - function pgoutput_begin_txn > > @@ -345,6 +345,23 @@ pgoutput_startup(LogicalDecodingContext *ctx, > OutputPluginOptions *opt, > static void > pgoutput_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn) > { > > I guess that you still needed to pass the txn because that is how the > API is documented, right? > > But I am wondering if you ought to flag it as unused so you wont get > some BF machine giving warnings about it. > > e.g. Syntax like this? > > pgoutput_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN * txn) { > (void)txn; > ... > Updated. > -- > > 8. pgoutput.c - function pgoutput_begin_txn > > @@ -345,6 +345,23 @@ pgoutput_startup(LogicalDecodingContext *ctx, > OutputPluginOptions *opt, > static void > pgoutput_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn) > { > + PGOutputData *data = ctx->output_plugin_private; > + > + /* > + * Don't send BEGIN message here. Instead, postpone it until the first > + * change. In logical replication, a common scenario is to replicate a set > + * of tables (instead of all tables) and transactions whose changes were > on > + * table(s) that are not published will produce empty transactions. These > + * empty transactions will send BEGIN and COMMIT messages to subscribers, > + * using bandwidth on something with little/no use for logical > replication. > + */ > + data->xact_wrote_changes = false; > + elog(LOG,"Holding of begin"); > +} > > Why is this loglevel LOG? Looks like leftover debugging. > Removed. > > -- > > 9. pgoutput.c - function pgoutput_commit_txn > > @@ -384,8 +401,14 @@ static void > pgoutput_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn, > XLogRecPtr commit_lsn) > { > + PGOutputData *data = ctx->output_plugin_private; > + > OutputPluginUpdateProgress(ctx); > > + /* skip COMMIT message if nothing was sent */ > + if (!data->xact_wrote_changes) > + return; > + > > In the case where you decided to do nothing does it make sense that > you still called the function OutputPluginUpdateProgress(ctx); ? > I thought perhaps that your new check should come first so this call > would never happen. > Even though the empty transaction is not sent, the LSN is tracked as decoded, hence the progress needs to be updated. > -- > > 10. pgoutput.c - variable declarations without casts > > + PGOutputData *data = ctx->output_plugin_private; > > I noticed the new stack variable you declare have no casts. > > This differs from the existing code which always looks like: > PGOutputData *data = (PGOutputData *)
Re: logical replication empty transactions
On Thu, Apr 15, 2021 at 4:39 PM Ajin Cherian wrote: > > > > On Thu, Apr 15, 2021 at 1:29 PM Ajin Cherian wrote: >> >> >> I've rebased the patch and made changes so that the patch supports >> "streaming in-progress transactions" and handling of logical decoding >> messages (transactional and non-transactional). >> I see that this patch not only makes sure that empty transactions are not >> sent but also does call OutputPluginUpdateProgress when an empty >> transaction is not sent, as a result the confirmed_flush_lsn is kept moving. >> I also see no hangs when synchronous_standby is configured. >> Do let me know your thoughts on this patch. REVIEW COMMENTS I applied this patch to today's HEAD and successfully ran "make check" and also the subscription TAP tests. Here are a some review comments: -- 1. The patch v3 applied OK but with whitespace warnings [postgres@CentOS7-x64 oss_postgres_2PC]$ git apply ../patches_misc/v3-0001-Skip-empty-transactions-for-logical-replication.patch ../patches_misc/v3-0001-Skip-empty-transactions-for-logical-replication.patch:98: indent with spaces. /* output BEGIN if we haven't yet, avoid for streaming and non-transactional messages */ ../patches_misc/v3-0001-Skip-empty-transactions-for-logical-replication.patch:99: indent with spaces. if (!data->xact_wrote_changes && !in_streaming && transactional) warning: 2 lines add whitespace errors. -- 2. Please create a CF entry in [1] for this patch. -- 3. Patch comment The comment describes the problem and then suddenly just says "Postpone the BEGIN message until the first change." I suggest changing it to say more like... "(blank line) This patch addresses the above problem by postponing the BEGIN message until the first change." -- 4. pgoutput.h Maybe for consistency with the context member, the comment for the new member should be to the right instead of above it? @@ -20,6 +20,9 @@ typedef struct PGOutputData MemoryContext context; /* private memory context for transient * allocations */ + /* flag indicating whether messages have previously been sent */ + boolxact_wrote_changes; + -- 5. pgoutput.h + /* flag indicating whether messages have previously been sent */ "previously been sent" --> "already been sent" ?? -- 6. pgoutput.h - misleading member name Actually, now that I have read all the rest of the code and how this member is used I feel that this name is very misleading. e.g. For "streaming" case then you still are writing changes but are not setting this member at all - therefore it does not always mean what it says. I feel a better name for this would be something like "sent_begin_txn". Then if you have sent BEGIN it is true. If you haven't sent BEGIN it is false. It eliminates all ambiguity naming it this way instead. (This makes my feedback #5 redundant because the comment will be a bit different if you do this). -- 7. pgoutput.c - function pgoutput_begin_txn @@ -345,6 +345,23 @@ pgoutput_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt, static void pgoutput_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn) { I guess that you still needed to pass the txn because that is how the API is documented, right? But I am wondering if you ought to flag it as unused so you wont get some BF machine giving warnings about it. e.g. Syntax like this? pgoutput_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN * txn) { (void)txn; ... -- 8. pgoutput.c - function pgoutput_begin_txn @@ -345,6 +345,23 @@ pgoutput_startup(LogicalDecodingContext *ctx, OutputPluginOptions *opt, static void pgoutput_begin_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn) { + PGOutputData *data = ctx->output_plugin_private; + + /* + * Don't send BEGIN message here. Instead, postpone it until the first + * change. In logical replication, a common scenario is to replicate a set + * of tables (instead of all tables) and transactions whose changes were on + * table(s) that are not published will produce empty transactions. These + * empty transactions will send BEGIN and COMMIT messages to subscribers, + * using bandwidth on something with little/no use for logical replication. + */ + data->xact_wrote_changes = false; + elog(LOG,"Holding of begin"); +} Why is this loglevel LOG? Looks like leftover debugging. -- 9. pgoutput.c - function pgoutput_commit_txn @@ -384,8 +401,14 @@ static void pgoutput_commit_txn(LogicalDecodingContext *ctx, ReorderBufferTXN *txn, XLogRecPtr commit_lsn) { + PGOutputData *data = ctx->output_plugin_private; + OutputPluginUpdateProgress(ctx); + /* skip COMMIT message if nothing was sent */ + if (!data->xact_wrote_changes) + return; + In the case where you decided to do nothing does it make sense that you still called the function OutputPluginUpdateProgress(ctx); ? I thought perhaps that your new check should come first so this call would never happen. -- 10. pgoutput.c - variable
Re: logical replication empty transactions
On Thu, Apr 15, 2021 at 1:29 PM Ajin Cherian wrote: > > I've rebased the patch and made changes so that the patch supports > "streaming in-progress transactions" and handling of logical decoding > messages (transactional and non-transactional). > I see that this patch not only makes sure that empty transactions are not > sent but also does call OutputPluginUpdateProgress when an empty > transaction is not sent, as a result the confirmed_flush_lsn is kept > moving. I also see no hangs when synchronous_standby is configured. > Do let me know your thoughts on this patch. > > Removed some debug logs and typos. regards, Ajin Cherian Fujitsu Australia v3-0001-Skip-empty-transactions-for-logical-replication.patch Description: Binary data
Re: logical replication empty transactions
On Thu, Sep 17, 2020 at 3:29 PM Michael Paquier wrote: > On Wed, Jul 29, 2020 at 08:08:06PM +0530, Rahila Syed wrote: > > The make check passes. > > Since then, the patch is failing to apply, waiting on author and the > thread has died 6 weeks or so ago, so I am marking it as RwF in the > CF. > > I've rebased the patch and made changes so that the patch supports "streaming in-progress transactions" and handling of logical decoding messages (transactional and non-transactional). I see that this patch not only makes sure that empty transactions are not sent but also does call OutputPluginUpdateProgress when an empty transaction is not sent, as a result the confirmed_flush_lsn is kept moving. I also see no hangs when synchronous_standby is configured. Do let me know your thoughts on this patch. regards, Ajin Cherian Fujitsu Australia v2-0001-Skip-empty-transactions-for-logical-replication.patch Description: Binary data
Re: logical replication empty transactions
On Wed, Jul 29, 2020 at 08:08:06PM +0530, Rahila Syed wrote: > The make check passes. Since then, the patch is failing to apply, waiting on author and the thread has died 6 weeks or so ago, so I am marking it as RwF in the CF. -- Michael signature.asc Description: PGP signature
Re: logical replication empty transactions
Hi, Please see below review of the 0001-Skip-empty-transactions-for-logical-replication.patch The make check passes. + /* output BEGIN if we haven't yet */ + if (!data->xact_wrote_changes) + pgoutput_begin(ctx, txn); + + data->xact_wrote_changes = true; + IMO, xact_wrote_changes flag is better set inside the if condition as it does not need to be set repeatedly in subsequent calls to the same function. * Stash BEGIN data in plugin's LogicalDecodingContext.output_plugin_private when plugin's begin callback called, don't write anything to the outstream * Write out BEGIN message lazily when any other callback generates a message that does need to be written out * If no BEGIN written by the time COMMIT callback called, discard the COMMIT too. Check if sync rep enabled. if it is, call LogicalDecodingContext.update_progress from within the output plugin commit handler, otherwise just ignore the commit totally. Probably by calling OutputPluginUpdateProgress(). I think the code in the patch is similar to what has been described by Craig in the above snippet, except instead of stashing the BEGIN message and sending the message lazily, it simply maintains a flag in LogicalDecodingContext.output_plugin_private which defers calling output plugin's begin callback, until any other callback actually generates a remote write. Also, the patch does not contain the last part where he describes having OutputPluginUpdateProgress() for synchronous replication enabled transactions. However, some basic testing suggests that the patch does not have any notable adverse effect on either the replication lag or the sync_rep performance. I performed tests by setting up publisher and subscriber on the same machine with synchronous_commit = on and ran pgbench -c 12 -j 6 -T 300 on unpublished pgbench tables. I see that confirmed_flush_lsn is catching up just fine without any notable delay as compared to the test results without the patch. Also, the TPS for synchronous replication of empty txns with and without the patch remains similar. Having said that, these are initial findings and I understand better performance tests are required to measure reduction in consumption of network bandwidth and impact on synchronous replication and replication lag. Thank you, Rahila Syed
Re: logical replication empty transactions
Sorry, I replied in the wrong thread. Please ignore above mail. > >
Re: logical replication empty transactions
The patch no longer applies, because of additions in the test source. Otherwise, I have tested the patch and confirmed that updates and deletes on tables with deferred primary keys work with logical replication. The new status of this patch is: Waiting on Author
Re: logical replication empty transactions
On Tue, 10 Mar 2020 at 02:30, Andres Freund wrote: > Hi, > > On 2020-03-06 13:53:02 +0800, Craig Ringer wrote: > > On Mon, 2 Mar 2020 at 19:26, Amit Kapila > wrote: > > > > > One thing that is not clear to me is how will we advance restart_lsn > > > if we don't send any empty xact in a system where there are many such > > > xacts? > > > > Same way we already do it for writes that are not replicated over > > logical replication, like vacuum work etc. The upstream sends feedback > > with reply-requested. The downstream replies. The upstream advances > > confirmed_flush_lsn, and that lazily updates restart_lsn. > > It'll still delay it a bit. > Right, but we don't generally care because there's no sync rep txn waiting for confirmation. If we lose progress due to a crash it doesn't matter. It does delay removal of old WAL a little, but it hardly matters. > Somewhat independent from the issue at hand: It'd be really good if we > could evolve the syncrep framework to support per-database waiting... It > shouldn't be that hard, and the current situation sucks quite a bit (and > yes, I'm to blame). > Hardly, you just didn't get the chance to fix that on top of the umpteen other things you had to change to make all the logical stuff work. You didn't break it, just didn't implement every single possible enhancement all at once. Shocking, I tell you. I'm not quite sure what you mean by "poke the walsender"? Kinda sounds > like sending a signal, but decoding happens inside after the walsender, > so there's no need for that. Do you just mean somehow requesting that > walsender sends a feedback message? > Right. I had in mind something like sending a ProcSignal via our funky multiplexed signal mechanism to ask the walsender to immediately generate a keepalive message with a reply-requested flag, then set the walsender's latch so we wake it promptly. > To address the volume we could: > > 1a) Introduce a pgoutput message type to indicate that the LSN has > advanced, without needing separate BEGIN/COMMIT. Right now BEGIN is > 21 bytes, COMMIT is 26. But we really don't need that much here. A > single message should do the trick. > It would. Is it worth caring though? Especially since it seems rather unlikely that the actual network data volume of begin/commit msgs will be much of a concern. It's not like we're PITRing logical streams, and if we did, we could just filter out empty commits on the receiver side. That message pretty much already exists in the form of a walsender keepalive anyway so we might as well re-use that and not upset the protocol. > 1b) Add a LogicalOutputPluginWriterUpdateProgress parameter (and > possibly rename) that indicates that we are intentionally "ignoring" > WAL. For walsender that callback then could check if it could just > forward the position of the client (if it was entirely caught up > before), or if it should send a feedback request (if syncrep is > enabled, or distance is big). > I can see something like that being very useful, because at present only the output plugin knows if a txn is "empty" as far as that particular slot and output plugin is concerned. The reorder buffering mechanism cannot do relation-level filtering before it sends the changes to the output plugin during ReorderBufferCommit, since it only knows about relfilenodes not relation oids. And the output plugin might be doing finer grained filtering using row-filter expressions or who knows what else. But as described above that will only help for txns done in DBs other than the one the logical slot is for or txns known to have an empty ReorderBuffer when the commit is seen. If there's a txn in the slot's db with a non-empty reorderbuffer, the output plugin won't know if the txn is empty or not until it finishes processing all callbacks and sees the commit for the txn. So it will generally have emitted the Begin message on the wire by the time it knows it has nothing useful to say. And Pg won't know that this txn is empty as far as this output plugin with this particular slot, set of output plugin params, and current user-catalog state is concerned, so it won't have any way to call the output plugin's "update progress" callback instead of the usual begin/change/commit callbacks. But I think we can already skip empty txns unless sync-rep is enabled with no core changes, and send empty txns as walsender keepalives instead, by altering only output plugins, like this: * Stash BEGIN data in plugin's LogicalDecodingContext.output_plugin_private when plugin's begin callback called, don't write anything to the outstream * Write out BEGIN message lazily when any other callback generates a message that does need to be written out * If no BEGIN written by the time COMMIT callback called, discard the COMMIT too. Check if sync rep enabled. if it is, call LogicalDecodingContext.update_progress from within the output plugin commit handler, otherwise just ignore the commit totally. Probably by
Re: logical replication empty transactions
Hi, On 2020-03-06 13:53:02 +0800, Craig Ringer wrote: > On Mon, 2 Mar 2020 at 19:26, Amit Kapila wrote: > > > One thing that is not clear to me is how will we advance restart_lsn > > if we don't send any empty xact in a system where there are many such > > xacts? > > Same way we already do it for writes that are not replicated over > logical replication, like vacuum work etc. The upstream sends feedback > with reply-requested. The downstream replies. The upstream advances > confirmed_flush_lsn, and that lazily updates restart_lsn. It'll still delay it a bit. > The bigger issue here is that if you don't send empty txns on logical > replication you don't get an eager, timely response from the > replica(s), which delays synchronous replication. You need to send > empty txns when synchronous replication is enabled, or instead poke > the walsender to force immediate feedback with reply requested. Somewhat independent from the issue at hand: It'd be really good if we could evolve the syncrep framework to support per-database waiting... It shouldn't be that hard, and the current situation sucks quite a bit (and yes, I'm to blame). I'm not quite sure what you mean by "poke the walsender"? Kinda sounds like sending a signal, but decoding happens inside after the walsender, so there's no need for that. Do you just mean somehow requesting that walsender sends a feedback message? To address the volume we could: 1a) Introduce a pgoutput message type to indicate that the LSN has advanced, without needing separate BEGIN/COMMIT. Right now BEGIN is 21 bytes, COMMIT is 26. But we really don't need that much here. A single message should do the trick. 1b) Add a LogicalOutputPluginWriterUpdateProgress parameter (and possibly rename) that indicates that we are intentionally "ignoring" WAL. For walsender that callback then could check if it could just forward the position of the client (if it was entirely caught up before), or if it should send a feedback request (if syncrep is enabled, or distance is big). 2) Reduce the rate of 'empty transaction'/feedback request messages. If we know that we're not going to be blocked waiting for more WAL, or blocked sending messages out to the network, we don't immediately need to send out the messages. Instead we could continue decoding until there's actual data, or until we're going to get blocked. We could e.g. have a new LogicalDecodingContext callback that is called whenever WalSndWaitForWal() would wait. That'd check if there's a pending "need" to send out a 'empty transaction'/feedback request message. The "need" flag would get cleared whenever we send out data bearing an LSN for other reasons. Greetings, Andres Freund
Re: logical replication empty transactions
On Mon, 2 Mar 2020 at 19:26, Amit Kapila wrote: > One thing that is not clear to me is how will we advance restart_lsn > if we don't send any empty xact in a system where there are many such > xacts? Same way we already do it for writes that are not replicated over logical replication, like vacuum work etc. The upstream sends feedback with reply-requested. The downstream replies. The upstream advances confirmed_flush_lsn, and that lazily updates restart_lsn. The bigger issue here is that if you don't send empty txns on logical replication you don't get an eager, timely response from the replica(s), which delays synchronous replication. You need to send empty txns when synchronous replication is enabled, or instead poke the walsender to force immediate feedback with reply requested. -- Craig Ringer http://www.2ndQuadrant.com/ 2ndQuadrant - PostgreSQL Solutions for the Enterprise
Re: logical replication empty transactions
On Thu, 5 Mar 2020 at 05:45, Amit Kapila wrote: > Euler, can we try to update the patch based on the number of > transactions threshold and see how it works? > > I will do. -- Euler Taveira http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: logical replication empty transactions
On Wed, Mar 4, 2020 at 4:04 PM Dilip Kumar wrote: > > On Wed, Mar 4, 2020 at 3:47 PM Amit Kapila wrote: > > > > On Wed, Mar 4, 2020 at 11:16 AM Dilip Kumar wrote: > > > > > > On Wed, Mar 4, 2020 at 10:50 AM Amit Kapila > > > wrote: > > > > > > > > On Wed, Mar 4, 2020 at 9:52 AM Dilip Kumar > > > > wrote: > > > > > > > > > > > > > > > IMHO, the threshold should be based on the commit LSN. Our main > > > > > reason we want to send empty transactions after a certain > > > > > transaction/duration is that we want the restart_lsn to be moving > > > > > forward so that if we need to restart the replication slot we don't > > > > > need to process a lot of extra WAL. So assume we set the threshold > > > > > based on transaction count then there is still a possibility that we > > > > > might process a few very big transactions then we will have to process > > > > > them again after the restart. > > > > > > > > > > > > > Won't the subscriber eventually send the flush location for the large > > > > transactions which will move the restart_lsn? > > > > > > I meant large empty transactions (basically we can not send anything > > > to the subscriber). So my point was if there are only large > > > transactions in the system which we can not stream because those > > > tables are not published. Then keeping threshold based on transaction > > > count will not help much because even if we don't reach the > > > transaction count threshold, we still might need to process a lot of > > > data if we don't stream the commit for the empty transactions. So > > > instead of tracking transaction count can we track LSN, and LSN > > > different since we last stream some change cross the threshold then we > > > will stream the next empty transaction. > > > > > > > You have a point and it may be better to keep threshold based on LSN > > if we want to keep any threshold, but keeping on transaction count > > seems to be a bit straightforward. Let us see if anyone else has any > > opinion on this matter? > > Ok, that make sense. > Euler, can we try to update the patch based on the number of transactions threshold and see how it works? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: logical replication empty transactions
On Wed, Mar 4, 2020 at 3:47 PM Amit Kapila wrote: > > On Wed, Mar 4, 2020 at 11:16 AM Dilip Kumar wrote: > > > > On Wed, Mar 4, 2020 at 10:50 AM Amit Kapila wrote: > > > > > > On Wed, Mar 4, 2020 at 9:52 AM Dilip Kumar wrote: > > > > > > > > > > > > IMHO, the threshold should be based on the commit LSN. Our main > > > > reason we want to send empty transactions after a certain > > > > transaction/duration is that we want the restart_lsn to be moving > > > > forward so that if we need to restart the replication slot we don't > > > > need to process a lot of extra WAL. So assume we set the threshold > > > > based on transaction count then there is still a possibility that we > > > > might process a few very big transactions then we will have to process > > > > them again after the restart. > > > > > > > > > > Won't the subscriber eventually send the flush location for the large > > > transactions which will move the restart_lsn? > > > > I meant large empty transactions (basically we can not send anything > > to the subscriber). So my point was if there are only large > > transactions in the system which we can not stream because those > > tables are not published. Then keeping threshold based on transaction > > count will not help much because even if we don't reach the > > transaction count threshold, we still might need to process a lot of > > data if we don't stream the commit for the empty transactions. So > > instead of tracking transaction count can we track LSN, and LSN > > different since we last stream some change cross the threshold then we > > will stream the next empty transaction. > > > > You have a point and it may be better to keep threshold based on LSN > if we want to keep any threshold, but keeping on transaction count > seems to be a bit straightforward. Let us see if anyone else has any > opinion on this matter? Ok, that make sense. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: logical replication empty transactions
On Wed, Mar 4, 2020 at 11:16 AM Dilip Kumar wrote: > > On Wed, Mar 4, 2020 at 10:50 AM Amit Kapila wrote: > > > > On Wed, Mar 4, 2020 at 9:52 AM Dilip Kumar wrote: > > > > > > > > > IMHO, the threshold should be based on the commit LSN. Our main > > > reason we want to send empty transactions after a certain > > > transaction/duration is that we want the restart_lsn to be moving > > > forward so that if we need to restart the replication slot we don't > > > need to process a lot of extra WAL. So assume we set the threshold > > > based on transaction count then there is still a possibility that we > > > might process a few very big transactions then we will have to process > > > them again after the restart. > > > > > > > Won't the subscriber eventually send the flush location for the large > > transactions which will move the restart_lsn? > > I meant large empty transactions (basically we can not send anything > to the subscriber). So my point was if there are only large > transactions in the system which we can not stream because those > tables are not published. Then keeping threshold based on transaction > count will not help much because even if we don't reach the > transaction count threshold, we still might need to process a lot of > data if we don't stream the commit for the empty transactions. So > instead of tracking transaction count can we track LSN, and LSN > different since we last stream some change cross the threshold then we > will stream the next empty transaction. > You have a point and it may be better to keep threshold based on LSN if we want to keep any threshold, but keeping on transaction count seems to be a bit straightforward. Let us see if anyone else has any opinion on this matter? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: logical replication empty transactions
On Wed, Mar 4, 2020 at 10:50 AM Amit Kapila wrote: > > On Wed, Mar 4, 2020 at 9:52 AM Dilip Kumar wrote: > > > > On Wed, Mar 4, 2020 at 9:12 AM Amit Kapila wrote: > > > > > > On Wed, Mar 4, 2020 at 7:17 AM Euler Taveira > > > wrote: > > > > > > > > On Tue, 3 Mar 2020 at 05:24, Amit Kapila > > > > wrote: > > > >> > > > >> > > > >> Another idea could be that we stream the transaction after some > > > >> threshold number (say 100 or anything we think is reasonable) of empty > > > >> xacts. This will reduce the traffic without tinkering with the core > > > >> design too much. > > > >> > > > >> > > > > Amit, I suggest an interval to control this setting. Time is something > > > > we have control; transactions aren't (depending on workload). > > > > pg_stat_replication query interval usually is not milliseconds, > > > > however, you can execute thousands of transactions in a second. If we > > > > agree on that idea I can add it to the patch. > > > > > > > > > > Do you mean to say that if for some threshold interval we didn't > > > stream any transaction, then we can send the next empty transaction to > > > the subscriber? If so, then isn't it possible that the empty xacts > > > happen irregularly after the specified interval and then we still end > > > up sending them all. I might be missing something here, so can you > > > please explain your idea in detail? Basically, how will it work and > > > how will it solve the problem. > > > > IMHO, the threshold should be based on the commit LSN. Our main > > reason we want to send empty transactions after a certain > > transaction/duration is that we want the restart_lsn to be moving > > forward so that if we need to restart the replication slot we don't > > need to process a lot of extra WAL. So assume we set the threshold > > based on transaction count then there is still a possibility that we > > might process a few very big transactions then we will have to process > > them again after the restart. > > > > Won't the subscriber eventually send the flush location for the large > transactions which will move the restart_lsn? I meant large empty transactions (basically we can not send anything to the subscriber). So my point was if there are only large transactions in the system which we can not stream because those tables are not published. Then keeping threshold based on transaction count will not help much because even if we don't reach the transaction count threshold, we still might need to process a lot of data if we don't stream the commit for the empty transactions. So instead of tracking transaction count can we track LSN, and LSN different since we last stream some change cross the threshold then we will stream the next empty transaction. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: logical replication empty transactions
On Wed, Mar 4, 2020 at 9:52 AM Dilip Kumar wrote: > > On Wed, Mar 4, 2020 at 9:12 AM Amit Kapila wrote: > > > > On Wed, Mar 4, 2020 at 7:17 AM Euler Taveira > > wrote: > > > > > > On Tue, 3 Mar 2020 at 05:24, Amit Kapila wrote: > > >> > > >> > > >> Another idea could be that we stream the transaction after some > > >> threshold number (say 100 or anything we think is reasonable) of empty > > >> xacts. This will reduce the traffic without tinkering with the core > > >> design too much. > > >> > > >> > > > Amit, I suggest an interval to control this setting. Time is something we > > > have control; transactions aren't (depending on workload). > > > pg_stat_replication query interval usually is not milliseconds, however, > > > you can execute thousands of transactions in a second. If we agree on > > > that idea I can add it to the patch. > > > > > > > Do you mean to say that if for some threshold interval we didn't > > stream any transaction, then we can send the next empty transaction to > > the subscriber? If so, then isn't it possible that the empty xacts > > happen irregularly after the specified interval and then we still end > > up sending them all. I might be missing something here, so can you > > please explain your idea in detail? Basically, how will it work and > > how will it solve the problem. > > IMHO, the threshold should be based on the commit LSN. Our main > reason we want to send empty transactions after a certain > transaction/duration is that we want the restart_lsn to be moving > forward so that if we need to restart the replication slot we don't > need to process a lot of extra WAL. So assume we set the threshold > based on transaction count then there is still a possibility that we > might process a few very big transactions then we will have to process > them again after the restart. > Won't the subscriber eventually send the flush location for the large transactions which will move the restart_lsn? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: logical replication empty transactions
On Wed, Mar 4, 2020 at 9:12 AM Amit Kapila wrote: > > On Wed, Mar 4, 2020 at 7:17 AM Euler Taveira > wrote: > > > > On Tue, 3 Mar 2020 at 05:24, Amit Kapila wrote: > >> > >> > >> Another idea could be that we stream the transaction after some > >> threshold number (say 100 or anything we think is reasonable) of empty > >> xacts. This will reduce the traffic without tinkering with the core > >> design too much. > >> > >> > > Amit, I suggest an interval to control this setting. Time is something we > > have control; transactions aren't (depending on workload). > > pg_stat_replication query interval usually is not milliseconds, however, > > you can execute thousands of transactions in a second. If we agree on that > > idea I can add it to the patch. > > > > Do you mean to say that if for some threshold interval we didn't > stream any transaction, then we can send the next empty transaction to > the subscriber? If so, then isn't it possible that the empty xacts > happen irregularly after the specified interval and then we still end > up sending them all. I might be missing something here, so can you > please explain your idea in detail? Basically, how will it work and > how will it solve the problem. IMHO, the threshold should be based on the commit LSN. Our main reason we want to send empty transactions after a certain transaction/duration is that we want the restart_lsn to be moving forward so that if we need to restart the replication slot we don't need to process a lot of extra WAL. So assume we set the threshold based on transaction count then there is still a possibility that we might process a few very big transactions then we will have to process them again after the restart. OTOH, if we set based on an interval then even if there is not much work going on, still we end up sending the empty transaction as pointed by Amit. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: logical replication empty transactions
On Wed, Mar 4, 2020 at 7:17 AM Euler Taveira wrote: > > On Tue, 3 Mar 2020 at 05:24, Amit Kapila wrote: >> >> >> Another idea could be that we stream the transaction after some >> threshold number (say 100 or anything we think is reasonable) of empty >> xacts. This will reduce the traffic without tinkering with the core >> design too much. >> >> > Amit, I suggest an interval to control this setting. Time is something we > have control; transactions aren't (depending on workload). > pg_stat_replication query interval usually is not milliseconds, however, you > can execute thousands of transactions in a second. If we agree on that idea I > can add it to the patch. > Do you mean to say that if for some threshold interval we didn't stream any transaction, then we can send the next empty transaction to the subscriber? If so, then isn't it possible that the empty xacts happen irregularly after the specified interval and then we still end up sending them all. I might be missing something here, so can you please explain your idea in detail? Basically, how will it work and how will it solve the problem. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: logical replication empty transactions
On Tue, 3 Mar 2020 at 05:24, Amit Kapila wrote: > > Another idea could be that we stream the transaction after some > threshold number (say 100 or anything we think is reasonable) of empty > xacts. This will reduce the traffic without tinkering with the core > design too much. > > > Amit, I suggest an interval to control this setting. Time is something we have control; transactions aren't (depending on workload). pg_stat_replication query interval usually is not milliseconds, however, you can execute thousands of transactions in a second. If we agree on that idea I can add it to the patch. Regards, -- Euler Taveira http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: logical replication empty transactions
On Tue, Mar 3, 2020 at 2:17 PM Dilip Kumar wrote: > > On Tue, Mar 3, 2020 at 1:54 PM Amit Kapila wrote: > > > > On Tue, Mar 3, 2020 at 9:35 AM Dilip Kumar wrote: > > > > > > On Mon, Mar 2, 2020 at 4:56 PM Amit Kapila > > > wrote: > > > > > > > > > > > > One thing that is not clear to me is how will we advance restart_lsn > > > > if we don't send any empty xact in a system where there are many such > > > > xacts? IIRC, the restart_lsn is advanced based on confirmed_flush lsn > > > > sent by subscriber. After this change, the subscriber won't be able > > > > to send the confirmed_flush and for a long time, we won't be able to > > > > advance restart_lsn. Is that correct, if so, why do we think that is > > > > acceptable? One might argue that restart_lsn will be advanced as soon > > > > as we send the first non-empty xact, but not sure if that is good > > > > enough. What do you think? > > > > > > It seems like a valid point. One idea could be that we can track the > > > last commit LSN which we streamed and if the confirmed flush location > > > is already greater than that then even if we skip the sending the > > > commit message we can increase the confirm flush location locally. > > > Logically, it should not cause any problem because once we have got > > > the confirmation for whatever we have streamed so far. So for other > > > commits(which we are skipping), we can we advance it locally because > > > we are sure that we don't have any streamed commit which is not yet > > > confirmed by the subscriber. > > > > > > > Will this work after restart? Do you want to persist the information > > of last streamed commit LSN? > > We will not persist the last streamed commit LSN, this variable is in > memory just to track whether we have got confirmation up to that > location or not, once we have confirmation up to that location and if > we are not streaming any transaction (because those are empty > transactions) then we can just advance the confirmed flush location > and based on that we can update the restart point as well and those > will be persisted. Basically, "last streamed commit LSN" is just a > marker that their still something pending to be confirmed from the > subscriber so until that we can not simply advance the confirm flush > location or restart point based on the empty transactions. But, if > there is nothing pending to be confirmed we can advance. So if we are > streaming then we will get confirmation from subscriber otherwise we > can advance it locally. So, in either case, the confirmed flush > location and restart point will keep moving. > Okay, so this might work out, but it might look a bit ad-hoc. > > > > > This is just my thought, but if we > > > think from the code and design perspective then it might complicate > > > the things and sounds hackish. > > > > > > > Another idea could be that we stream the transaction after some > > threshold number (say 100 or anything we think is reasonable) of empty > > xacts. This will reduce the traffic without tinkering with the core > > design too much. > > Yeah, this could be also an option. > Okay. Peter E, Petr J, others, do you have any opinion on what is the best way forward for this thread? I think it would be really good if we can reduce the network traffic due to these empty transactions. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: logical replication empty transactions
On Tue, Mar 3, 2020 at 1:54 PM Amit Kapila wrote: > > On Tue, Mar 3, 2020 at 9:35 AM Dilip Kumar wrote: > > > > On Mon, Mar 2, 2020 at 4:56 PM Amit Kapila wrote: > > > > > > > > > One thing that is not clear to me is how will we advance restart_lsn > > > if we don't send any empty xact in a system where there are many such > > > xacts? IIRC, the restart_lsn is advanced based on confirmed_flush lsn > > > sent by subscriber. After this change, the subscriber won't be able > > > to send the confirmed_flush and for a long time, we won't be able to > > > advance restart_lsn. Is that correct, if so, why do we think that is > > > acceptable? One might argue that restart_lsn will be advanced as soon > > > as we send the first non-empty xact, but not sure if that is good > > > enough. What do you think? > > > > It seems like a valid point. One idea could be that we can track the > > last commit LSN which we streamed and if the confirmed flush location > > is already greater than that then even if we skip the sending the > > commit message we can increase the confirm flush location locally. > > Logically, it should not cause any problem because once we have got > > the confirmation for whatever we have streamed so far. So for other > > commits(which we are skipping), we can we advance it locally because > > we are sure that we don't have any streamed commit which is not yet > > confirmed by the subscriber. > > > > Will this work after restart? Do you want to persist the information > of last streamed commit LSN? We will not persist the last streamed commit LSN, this variable is in memory just to track whether we have got confirmation up to that location or not, once we have confirmation up to that location and if we are not streaming any transaction (because those are empty transactions) then we can just advance the confirmed flush location and based on that we can update the restart point as well and those will be persisted. Basically, "last streamed commit LSN" is just a marker that their still something pending to be confirmed from the subscriber so until that we can not simply advance the confirm flush location or restart point based on the empty transactions. But, if there is nothing pending to be confirmed we can advance. So if we are streaming then we will get confirmation from subscriber otherwise we can advance it locally. So, in either case, the confirmed flush location and restart point will keep moving. > > > This is just my thought, but if we > > think from the code and design perspective then it might complicate > > the things and sounds hackish. > > > > Another idea could be that we stream the transaction after some > threshold number (say 100 or anything we think is reasonable) of empty > xacts. This will reduce the traffic without tinkering with the core > design too much. Yeah, this could be also an option. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com