Re: Slow catchup of 2PC (twophase) transactions on replica in LR
Hi Kuroda-san, I did not apply these v12* patches, but I have diff'ed all of them with the previous v11* patches and confirmed that all of my previous review comments now seem to be addressed. I don't have any more comments to make at this time. Thanks! == Kind Regards, Peter Smith. Fujitsu Australia
Re: Slow catchup of 2PC (twophase) transactions on replica in LR
Hi, Here are my remaining review comments for the latest v11* patches. // v11-0001 // No changes. No comments. // v11-0002 // == doc/src/sgml/ref/alter_subscription.sgml 2.1. ALTER SUBSCRIPTION ... SET (failover = on|off) and - ALTER SUBSCRIPTION ... SET (two_phase = on|off) + ALTER SUBSCRIPTION ... SET (two_phase = off) My other thread patch has already been pushed [1], so now you can modify this to say "true|false" as previously suggested. // v11-0003 // == src/backend/commands/subscriptioncmds.c 3.1. AlterSubscription + subtwophase = LOGICALREP_TWOPHASE_STATE_DISABLED; + } + else + subtwophase = LOGICALREP_TWOPHASE_STATE_PENDING; + + /* Change system catalog acoordingly */ values[Anum_pg_subscription_subtwophasestate - 1] = - CharGetDatum(opts.twophase ? - LOGICALREP_TWOPHASE_STATE_PENDING : - LOGICALREP_TWOPHASE_STATE_DISABLED); + CharGetDatum(subtwophase); replaces[Anum_pg_subscription_subtwophasestate - 1] = true; Sorry, I don't think that 'subtwophase' change is an improvement -- IMO the existing ternary code was fine as-is. I only reported the excessive flag checking in the previous v10-0003 review because of some extra "if (!opts.twophase)" code but that was caused by what you called "wrong git operations." and is already fixed now. // v11-0004 // == src/sgml/catalogs.sgml 4.1. + + + subforcealter bool + + + If true, then the ALTER SUBSCRIPTION + can disable two_phase option, even if there are + uncommitted prepared transactions from when two_phase + was enabled + + + I think this description should be changed to say what it *really* does. IMO, the stuff about 'two_phase' is more like a side-effect. SUGGESTION (this is just pseudo-code. You can add the CREATE SUBSCRIPTION 'force_alter' link appropriately) If true, then the ALTER SUBSCRIPTION command can sometimes be forced to proceed instead of giving an error. See force_alter parameter for details about when this might be useful. == [1] https://github.com/postgres/postgres/commit/fa65a022db26bc446fb67ce1d7ac543fa4bb72e4 Kind Regards, Peter Smith. Fujitsu Australia
RE: Slow catchup of 2PC (twophase) transactions on replica in LR
Dear Peter, > You wrote "Fixed" for that patch v9-0004 suggestion but I don't think > anything was changed at all. Accidentally missed? Sorry, I missed to do `git add` after the revision. The change was also included in new patch [1]. [1]: https://www.postgresql.org/message-id/OSBPR01MB25522052F9F3E3AAD3BA2A8CF5ED2%40OSBPR01MB2552.jpnprd01.prod.outlook.com Best Regards, Hayato Kuroda FUJITSU LIMITED https://www.fujitsu.com/
Re: Slow catchup of 2PC (twophase) transactions on replica in LR
On Tue, May 14, 2024 at 10:03 PM Hayato Kuroda (Fujitsu) wrote: > > Dear Peter, > ... > > 4.11. > > > > +# Alter the two_phase with the force_alter option. Apart from the the last > > +# ALTER SUBSCRIPTION command, the command will abort the prepared > > transaction > > +# and succeed. > > > > There is typo "the the" and the wording is a bit strange. Why not just say: > > > > SUGGESTION > > Alter the two_phase true to false with the force_alter option enabled. > > This command will succeed after aborting the prepared transaction. > > Fixed. > You wrote "Fixed" for that patch v9-0004 suggestion but I don't think anything was changed at all. Accidentally missed? == Kind Regards, Peter Smith. Futjisu Australia
Re: Slow catchup of 2PC (twophase) transactions on replica in LR
Hi Kuroda-san. Here are my review comments for latest v10* patches. // patch v10-0001 // No changes. No comments. // patch v10-0002 // == Commit message 2.1. Regarding the false->true case, the backend process alters the subtwophase to LOGICALREP_TWOPHASE_STATE_PENDING once. After the subscription is enabled, a new logical replication worker requests to change the two_phase option of its slot from pending to true after the initial data synchronization is done. The code path is the same as the case in which two_phase is initially set to true, so there is no need to do something remarkable. However, for the true->false case, the backend must connect to the publisher and expressly change the parameter because the apply worker does not alter the option to false. The operation cannot be rolled back, and altering the parameter from "true" to "false" within a transaction is prohibited. ~ BEFORE The operation cannot be rolled back, and altering the parameter from "true" to "false" within a transaction is prohibited. SUGGESTION Because this operation cannot be rolled back, altering the two_phase parameter from "true" to "false" within a transaction is prohibited. == doc/src/sgml/ref/alter_subscription.sgml 2.2. ALTER SUBSCRIPTION ... SET (failover = on|off) and - ALTER SUBSCRIPTION ... SET (two_phase = on|off) + ALTER SUBSCRIPTION ... SET (two_phase = off) I wasn't sure why you chose to keep on|off here instead of true|false, since in subsequence patch 0003 you changed it true/false everywhere as discussed in previous reviews. OTOH if you only did this to be consistent with the "failover=on|off" then that is OK; but in that case I might raise a separate hackers thread to propose those should also be changed to true|false for consistency with the parameer listed on the CREATE SUBSCRIPTION page. What do you think? == src/backend/commands/subscriptioncmds.c 2.3. /* - * The changed two_phase option of the slot can't be rolled - * back. + * Altering the parameter from "true" to "false" within a + * transaction is prohibited. Since the apply worker does + * not alter the slot option to false, the backend must + * connect to the publisher and expressly change the + * parameter. + * + * There is no need to do something remarkable regarding + * the "false" to "true" case; the backend process alters + * subtwophase to LOGICALREP_TWOPHASE_STATE_PENDING once. + * After the subscription is enabled, a new logical + * replication worker requests to change the two_phase + * option of its slot when the initial data synchronization + * is done. The code path is the same as the case in which + * two_phase is initially set to true. */ BEFORE ...worker requests to change the two_phase option of its slot when... SUGGESTION ...worker requests to change the two_phase option of its slot from pending to true when... == src/test/subscription/t/099_twophase_added.pl 2.4. +# +# Check the case that prepared transactions exist on the publisher node. +# +# Since the two_phase is "off", then normally, this PREPARE will do nothing +# until the COMMIT PREPARED, but in this test, we toggle the two_phase to +# "true" again before the COMMIT PREPARED happens. /Since the two_phase is "off"/Since the two_phase is "false"/ // patch v10-0003 // == src/backend/commands/subscriptioncmds.c 3.1. AlterSubscription + * If two_phase was enabled, there is a possibility that + * transactions have already been PREPARE'd. They must be + * checked and rolled back. */ if (!opts.twophase) I think it will less ambiguous if you modify this to say "If two_phase was previously enabled" ~~~ 3.2. if (!opts.twophase) { List *prepared_xacts; /* * Altering the parameter from "true" to "false" within * a transaction is prohibited. Since the apply worker * does not alter the slot option to false, the backend * must connect to the publisher and expressly change * the parameter. * * There is no need to do something remarkable * regarding the "false" to "true" case; the backend * process alters subtwophase to * LOGICALREP_TWOPHASE_STATE_PENDING once. After the * subscription is enabled, a new logical replication * worker requests to change the two_phase option of * its slot when the initial data synchronization is * done. The code path is the same as the case in which * two_phase is initially set to true. */ if (!opts.twophase) PreventInTransactionBlock(isTopLevel, "ALTER SUBSCRIPTION ... SET (two_phase = false)"); /* * To prevent prepared transactions from being * isolated, they must manually be aborted. */ if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED && (prepared_xacts = GetGidListBySubid(subid)) != NIL) { /* Abort all listed transactions */ foreach_ptr(char, gid, prepared_xacts) FinishPreparedTransaction(gid, false); list_free_deep(prepared_xacts); } } /* Change system catalog acoordingly */
RE: Slow catchup of 2PC (twophase) transactions on replica in LR
Dear Peter, Thanks for reviewing! New patch is available in [1]. > I'm having second thoughts about how these patches mention the option > values "on|off". These are used in the ALTER SUBSCRIPTION document > page for 'two_phase' and 'failover' parameters, and then those > "on|off" get propagated to the code comments, error messages, and > tests... > > Now I see that on the CREATE SUBSCRIPTION page [1], every boolean > parameter (even including 'two_phase' and 'failover') is described in > terms of "true|false" (not "on|off"). Hmm. But I could sentences like "The default value is off,...". Also, in alter_subscription.sgml, "on|off" notation has already been used. Not sure, but I felt there are no rules around here. > In hindsight, it is probably better to refer only to true|false > everywhere for these boolean parameters, instead of sometimes using > different values like on|off. > > What do you think? It's OK for me to make message/code comments consistent. Not sure the documentation, but followed only my part. [1]: https://www.postgresql.org/message-id/OSBPR01MB2552F66463EFCFD654E87C09F5E32%40OSBPR01MB2552.jpnprd01.prod.outlook.com Best Regards, Hayato Kuroda FUJITSU LIMITED https://www.fujitsu.com/
Re: Slow catchup of 2PC (twophase) transactions on replica in LR
Hi Kuroda-san, Here are some review comments for all patches v9* // Patch v9-0001 // There were no changes since v8-0001, so no comments. // Patch v9-0002 // == Commit Message 2.1. Regarding the off->on case, the logical replication already has a mechanism for it, so there is no need to do anything special for the on->off case; however, we must connect to the publisher and expressly change the parameter. The operation cannot be rolled back, and altering the parameter from "on" to "off" within a transaction is prohibited. In the opposite case, there is no need to prevent this because the logical replication worker already had the mechanism to alter the slot option at a convenient time. ~ This explanation seems to be going around in circles, without giving any new information: AFAICT, "Regarding the off->on case, the logical replication already has a mechanism for it, so there is no need to do anything special for the on->off case;" is saying pretty much the same as: "In the opposite case, there is no need to prevent this because the logical replication worker already had the mechanism to alter the slot option at a convenient time." But, what I hoped for in previous review comments was an explanation somewhat less vague than "already has a mechanism" or "already had the mechanism". Can't this have just 1 or 2 lines to say WHAT is that existing mechanism for the "off" to "on" case, and WHY that means there is nothing special to do in that scenario? == src/backend/commands/subscriptioncmds.c 2.2. AlterSubscription /* - * The changed two_phase option of the slot can't be rolled - * back. + * Since altering the two_phase option of subscriptions + * also leads to changing the slot option, this command + * cannot be rolled back. So prevent this if we are in a + * transaction block. In the opposite case, there is no + * need to prevent this because the logical replication + * worker already had the mechanism to alter the slot + * option at a convenient time. */ (Same previous review comments, and same as my review comment for the commit message above). I don't think "already had the mechanism" is enough explanation. Also, the 2nd sentence doesn't make sense here because the comment only said "altering the slot option" -- it didn't say it was altering it to "on" or altering it to "off", so "the opposite case" has no meaning. ~~~ 2.3. AlterSubscription /* - * Try to acquire the connection necessary for altering slot. + * Check the need to alter the replication slot. Failover and two_phase + * options are controlled by both the publisher (as a slot option) and the + * subscriber (as a subscription option). The slot option must be altered + * only when changing "on" to "off". Because in opposite case, the logical + * replication worker already has the mechanism to do so at a convenient + * time. + */ + update_failover = replaces[Anum_pg_subscription_subfailover - 1]; + update_two_phase = (replaces[Anum_pg_subscription_subtwophasestate - 1] && + !opts.twophase); This is again the same as other review comments above. Probably, when some better explanation can be found for "already has the mechanism to do so at a convenient time." then all of these places can be changed using similar text. // Patch v9-0003 // There are some imperfect code comments but AFAIK they are the same ones from patch 0002. I think patch 0003 is just moving those comments to different places, so probably they would already be addressed by patch 0002. // Patch v9-0004 // == doc/src/sgml/catalogs.sgml 4.1. + + + subforcealter bool + + + If true, the subscription can be altered two_phase + option, even if there are prepared transactions + + + BEFORE If true, the subscription can be altered two_phase option, even if there are prepared transactions SUGGESTION If true, then the ALTER SUBSCRIPTION command can disable two_phase option, even if there are uncommitted prepared transactions from when two_phase was enabled == doc/src/sgml/ref/alter_subscription.sgml 4.2. - - - The two_phase parameter can only be altered when the - subscription is disabled. When altering the parameter from on - to off, the backend process checks for any incomplete - prepared transactions done by the logical replication worker (from when - two_phase parameter was still on) - and, if any are found, those are aborted. - Well, I still think there ought to be some mention of the relationship between 'force_alter' and 'two_phase' given on this ALTER SUBSCRIPTION page. Then the user can cross-reference to read what the 'force_alter' actually does. == doc/src/sgml/ref/create_subscription.sgml 4.3. + + +force_alter (boolean) + + + Specifies whether the subscription can be altered + two_phase option, even if there are prepared
Re: Slow catchup of 2PC (twophase) transactions on replica in LR
Hi Kuroda-san, I'm having second thoughts about how these patches mention the option values "on|off". These are used in the ALTER SUBSCRIPTION document page for 'two_phase' and 'failover' parameters, and then those "on|off" get propagated to the code comments, error messages, and tests... Now I see that on the CREATE SUBSCRIPTION page [1], every boolean parameter (even including 'two_phase' and 'failover') is described in terms of "true|false" (not "on|off"). In hindsight, it is probably better to refer only to true|false everywhere for these boolean parameters, instead of sometimes using different values like on|off. What do you think? == [1] https://www.postgresql.org/docs/devel/sql-createsubscription.html Kind Regards, Peter Smith. Fujitsu Australia
RE: Slow catchup of 2PC (twophase) transactions on replica in LR
Dear Peter, Thanks for giving comments! New patch was posted in [1]. > 0.1 General - Patch name > > /SUBSCIRPTION/SUBSCRIPTION/ Fixed. > == > 0.2 General - Apply > > FYI, there are whitespace warnings: > > git > apply ../patches_misc/v8-0004-Add-force_alter-option-for-ALTER-SUBSCIRPTI > ON-.-S.patch > ../patches_misc/v8-0004-Add-force_alter-option-for-ALTER-SUBSCIRPTION-.- > S.patch:191: > trailing whitespace. > # command will abort the prepared transaction and succeed. > warning: 1 line adds whitespace errors. I didn't recognize, fixed. > == > 0.3 General - Regression test fails > > The subscription regression tests are not working. > > ok 158 + publication 1187 ms > not ok 159 + subscription 123 ms > > See review comments #4 and #5 below for the reason why. Yeah, I missed to update the expected result. Fixed. > == > src/sgml/ref/alter_subscription.sgml > > 1. > >The two_phase parameter can only be altered when > the > - subscription is disabled. When altering the parameter from > on > - to off, the backend process checks prepared > - transactions done by the logical replication worker and aborts them. > + subscription is disabled. Altering the parameter from > on > + to off will be failed when there are prepared > + transactions done by the logical replication worker. If you want to > alter > + the parameter forcibly in this case, force_alter > + option must be set to on at the same time. If > + specified, the backend process aborts prepared transactions. > > 1a. > That "will be failed when..." seems strange. Maybe say "will give an > error when..." > > ~ > 1b. > Because "force" is a verb, I think true/false is more natural than > on/off for this new boolean option. e.g. it acts more like a "flag" > than a "mode". See all the other boolean options in CREATE > SUBSCRIPTION -- those are mostly all verbs too and are all true/false > AFAIK. Fixed, but note that the part was moved. > > == > > 2. CREATE SUBSCRIPTION > > For my previous review, two comments [v7-0004#2] and [v7-0004#3] were > not addressed. Kuroda-san wrote: > Hmm, but the parameter cannot be used for CREATE SUBSCRIPTION. Should > we modify to accept and add the description in the doc? > > ~ > > Yes, that is what I am suggesting. IMO it is odd for the user to be > able to ALTER a parameter that cannot be included in the CREATE > SUBSCRIPTION in the first place. AFAIK there are no other parameters > that behave that way. Hmm. I felt that this change required the new attribute in pg_subscription system catalog. Previously I did not like because it contains huge change, but...I tried to do. New attribute 'subforcealter', and some parts were updated accordingly. > src/backend/commands/subscriptioncmds.c > > 3. AlterSubscription > > + if (!opts.force_alter) > + ereport(ERROR, > + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > + errmsg("cannot alter %s when there are prepared transactions", > + "two_phase = off"), > + errhint("Resolve these transactions or set %s at the same time, and > then try again.", > + "force_alter = true"))); > > I think saying "at the same time" in the hint is unnecessary. Surely > the user is allowed to set this parameter separately if they want to? > > e.g. > ALTER SUBSCRIPTION sub SET (force_alter=true); > ALTER SUBSCRIPTION sub SET (two_phase=off); Actually, it was correct. Since force_alter was not recorded in the system catalog, it must be specified at the same time. Now, we allow to be separated, so removed. > == > src/test/regress/expected/subscription.out > > 4. > +-- fail - force_alter cannot be set alone > +ALTER SUBSCRIPTION regress_testsub SET (force_alter = true); > +ERROR: force_alter must be specified with two_phase > > This error cannot happen. You removed that error! Fixed. > == > src/test/subscription/t/099_twophase_added.pl > > 6. > +# Try altering the two_phase option to "off." The command will fail since > there > +# is a prepared transaction and the force option is not specified. > +my $stdout; > +my $stderr; > + > +($result, $stdout, $stderr) = $node_subscriber->psql( > + 'postgres', "ALTER SUBSCRIPTION regress_sub SET (two_phase = off);"); > +ok($stderr =~ /cannot alter two_phase = off when there are prepared > transactions/, > + 'ALTER SUBSCRIPTION failed'); > > /force option is not specified./'force_alter' option is not specified as > true./ Fixed. > > 7. > +# Verify the prepared transaction still exists > +$result = $node_subscriber->safe_psql('postgres', > +"SELECT count(*) FROM pg_prepared_xacts;"); > +is($result, q(1), "prepared transaction still exits"); > + > > TYPO: /exits/exists/ Fixed. > > ~~~ > > 8. > +# Alter the two_phase with the force_alter option. Apart from the above, the > +# command will abort the prepared transaction and succeed. >
RE: Slow catchup of 2PC (twophase) transactions on replica in LR
Dear Peter, Thanks for reviewing! Patch can be available in [1]. > == > src/sgml/ref/alter_subscription.sgml > > 1. > + > + The two_phase parameter can only be altered when > the > + subscription is disabled. When altering the parameter from > on > + to off, the backend process checks prepared > + transactions done by the logical replication worker and aborts them. > + > > The text may be OK as-is, but I was wondering if it might be better to > give a more verbose explanation. > > BEFORE > ... the backend process checks prepared transactions done by the > logical replication worker and aborts them. > > SUGGESTION > ... the backend process checks for any incomplete prepared > transactions done by the logical replication worker (from when > two_phase parameter was still "on") and, if any are found, those are > aborted. > Fixed. > == > src/backend/commands/subscriptioncmds.c > > 2. AlterSubscription > > - /* > - * Since the altering two_phase option of subscriptions > - * also leads to the change of slot option, this command > - * cannot be rolled back. So prevent we are in the > - * transaction block. > + * If two_phase was enabled, there is a possibility the > + * transactions has already been PREPARE'd. They must be > + * checked and rolled back. > */ > > BEFORE > ... there is a possibility the transactions has already been PREPARE'd. > > SUGGESTION > ... there is a possibility that transactions have already been PREPARE'd. Fixed. > 3. AlterSubscription > + /* > + * Since the altering two_phase option of subscriptions > + * (especially on->off case) also leads to the > + * change of slot option, this command cannot be rolled > + * back. So prevent we are in the transaction block. > + */ > PreventInTransactionBlock(isTopLevel, > "ALTER SUBSCRIPTION ... SET (two_phase = off)"); > > > This comment is a bit vague and includes some typos, but IIUC these > problems will already be addressed by the 0002 patch changes.AFAIK > patch 0003 is only moving the 0002 comment. Yeah, the comment was updated accordingly. > > == > src/test/subscription/t/099_twophase_added.pl > > 4. > +# Check the case that prepared transactions exist on the subscriber node > +# > +# If the two_phase is altering from "on" to "off" and there are prepared > +# transactions on the subscriber, they must be aborted. This test checks it. > + > > Similar to the comment that I gave for v8-0002. I think there should > be comment for the major test comment to > distinguish it from comments for the sub-steps. Added. > 5. > +# Verify the prepared transaction are aborted because two_phase is changed to > +# "off". > +$result = $node_subscriber->safe_psql('postgres', > +"SELECT count(*) FROM pg_prepared_xacts;"); > +is($result, q(0), "prepared transaction done by worker is aborted"); > + > > /the prepared transaction are aborted/any prepared transactions are aborted/ Fixed. [1]: https://www.postgresql.org/message-id/OSBPR01MB2552FEA48D265EA278AA9F7AF5E22%40OSBPR01MB2552.jpnprd01.prod.outlook.com Best Regards, Hayato Kuroda FUJITSU LIMITED https://www.fujitsu.com/
Re: Slow catchup of 2PC (twophase) transactions on replica in LR
Hi, Here are some review comments for v8-0002. == Commit message 1. Regarding the off->on case, the logical replication already has a mechanism for it, so there is no need to do anything special for the on->off case; however, we must connect to the publisher and expressly change the parameter. The operation cannot be rolled back, and altering the parameter from "on" to "off" within a transaction is prohibited. ~ I think the difference between "off"-to"on" and "on"-to"off" needs to be explained in more detail. Specifically "already has a mechanism for it" seems very vague. == src/backend/commands/subscriptioncmds.c 2. /* - * The changed two_phase option of the slot can't be rolled - * back. + * Since the altering two_phase option of subscriptions + * also leads to the change of slot option, this command + * cannot be rolled back. So prevent we are in the + * transaction block. */ - PreventInTransactionBlock(isTopLevel, "ALTER SUBSCRIPTION ... SET (two_phase)"); + if (!opts.twophase) + PreventInTransactionBlock(isTopLevel, + 2a. There is a typo: "So prevent we are". SUGGESTION (minor adjustment and typo fix) Since altering the two_phase option of subscriptions also leads to changing the slot option, this command cannot be rolled back. So prevent this if we are in a transaction block. ~ 2b. But, in my previous review [v7-0002#3] I asked if the comment could explain why this check is only needed for two_phase "on"-to-"off" but not for "off"-to-"on". That explanation/reason is still not yet given in the latest comment. ~~~ 3. /* - * Try to acquire the connection necessary for altering slot. + * Check the need to alter the replication slot. Failover and two_phase + * options are controlled by both the publisher (as a slot option) and the + * subscriber (as a subscription option). + */ + update_failover = replaces[Anum_pg_subscription_subfailover - 1]; + update_two_phase = (replaces[Anum_pg_subscription_subtwophasestate - 1] && + !opts.twophase); (This is similar to the previous comment). In my previous review [v7-0002#3a] I asked why update_two_phase is TRUE only if 'two-phase' is being updated "on"-to-"off", but not when it is being updated "off"-to-"on". That explanation/reason is still not yet given in the latest comment. == src/backend/replication/libpqwalreceiver/libpqwalreceiver.c 4. - appendStringInfo(, "ALTER_REPLICATION_SLOT %s ( FAILOVER %s, TWO_PHASE %s )", - quote_identifier(slotname), - failover ? "true" : "false", - two_phase ? "true" : "false"); + appendStringInfo(, "ALTER_REPLICATION_SLOT %s ( ", + quote_identifier(slotname)); + + if (failover) + appendStringInfo(, "FAILOVER %s", + *failover ? "true" : "false"); + + if (failover && two_phase) + appendStringInfo(, ", "); + + if (two_phase) + appendStringInfo(, "TWO_PHASE %s", + *two_phase ? "true" : "false"); + + appendStringInfoString(, ");"); It will be better if that last line includes the extra space like I had suggested in [v7-0002#4a] so the result will have the same spacing as in the original code. e.g. + appendStringInfoString(, " );"); == src/test/subscription/t/099_twophase_added.pl 5. +# Check the case that prepared transactions exist on the publisher node. +# +# Since the two_phase is "off", then normally, this PREPARE will do nothing +# until the COMMIT PREPARED, but in this test, we toggle the two_phase to "on" +# again before the COMMIT PREPARED happens. This is a major test part so IMO this comment should have # like it had before, to distinguish it from all the sub-step comments. == My v7-0002 review - https://www.postgresql.org/message-id/CAHut%2BPtu_w_UCGR-5DbenA%2By7wRiA8QPi_ZP%3DCCJ3SGdTn-%3D%3Dg%40mail.gmail.com Kind Regards, Peter Smith. Fujitsu Australia.
Re: Slow catchup of 2PC (twophase) transactions on replica in LR
Hi, Here are some comments for v8-0004 == 0.1 General - Patch name /SUBSCIRPTION/SUBSCRIPTION/ == 0.2 General - Apply FYI, there are whitespace warnings: git apply ../patches_misc/v8-0004-Add-force_alter-option-for-ALTER-SUBSCIRPTION-.-S.patch ../patches_misc/v8-0004-Add-force_alter-option-for-ALTER-SUBSCIRPTION-.-S.patch:191: trailing whitespace. # command will abort the prepared transaction and succeed. warning: 1 line adds whitespace errors. == 0.3 General - Regression test fails The subscription regression tests are not working. ok 158 + publication 1187 ms not ok 159 + subscription 123 ms See review comments #4 and #5 below for the reason why. == src/sgml/ref/alter_subscription.sgml 1. The two_phase parameter can only be altered when the - subscription is disabled. When altering the parameter from on - to off, the backend process checks prepared - transactions done by the logical replication worker and aborts them. + subscription is disabled. Altering the parameter from on + to off will be failed when there are prepared + transactions done by the logical replication worker. If you want to alter + the parameter forcibly in this case, force_alter + option must be set to on at the same time. If + specified, the backend process aborts prepared transactions. 1a. That "will be failed when..." seems strange. Maybe say "will give an error when..." ~ 1b. Because "force" is a verb, I think true/false is more natural than on/off for this new boolean option. e.g. it acts more like a "flag" than a "mode". See all the other boolean options in CREATE SUBSCRIPTION -- those are mostly all verbs too and are all true/false AFAIK. == 2. CREATE SUBSCRIPTION For my previous review, two comments [v7-0004#2] and [v7-0004#3] were not addressed. Kuroda-san wrote: Hmm, but the parameter cannot be used for CREATE SUBSCRIPTION. Should we modify to accept and add the description in the doc? ~ Yes, that is what I am suggesting. IMO it is odd for the user to be able to ALTER a parameter that cannot be included in the CREATE SUBSCRIPTION in the first place. AFAIK there are no other parameters that behave that way. == src/backend/commands/subscriptioncmds.c 3. AlterSubscription + if (!opts.force_alter) + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("cannot alter %s when there are prepared transactions", + "two_phase = off"), + errhint("Resolve these transactions or set %s at the same time, and then try again.", + "force_alter = true"))); I think saying "at the same time" in the hint is unnecessary. Surely the user is allowed to set this parameter separately if they want to? e.g. ALTER SUBSCRIPTION sub SET (force_alter=true); ALTER SUBSCRIPTION sub SET (two_phase=off); == src/test/regress/expected/subscription.out 4. +-- fail - force_alter cannot be set alone +ALTER SUBSCRIPTION regress_testsub SET (force_alter = true); +ERROR: force_alter must be specified with two_phase This error cannot happen. You removed that error! == src/test/regress/sql/subscription.sql 5. +-- fail - force_alter cannot be set alone +ALTER SUBSCRIPTION regress_testsub SET (force_alter = true); Why is this being tested? You removed that error condition. == src/test/subscription/t/099_twophase_added.pl 6. +# Try altering the two_phase option to "off." The command will fail since there +# is a prepared transaction and the force option is not specified. +my $stdout; +my $stderr; + +($result, $stdout, $stderr) = $node_subscriber->psql( + 'postgres', "ALTER SUBSCRIPTION regress_sub SET (two_phase = off);"); +ok($stderr =~ /cannot alter two_phase = off when there are prepared transactions/, + 'ALTER SUBSCRIPTION failed'); /force option is not specified./'force_alter' option is not specified as true./ ~~~ 7. +# Verify the prepared transaction still exists +$result = $node_subscriber->safe_psql('postgres', +"SELECT count(*) FROM pg_prepared_xacts;"); +is($result, q(1), "prepared transaction still exits"); + TYPO: /exits/exists/ ~~~ 8. +# Alter the two_phase with the force_alter option. Apart from the above, the +# command will abort the prepared transaction and succeed. +$node_subscriber->safe_psql('postgres', +"ALTER SUBSCRIPTION regress_sub SET (two_phase = off, force_alter = true);"); +$node_subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION regress_sub ENABLE;"); + What does "Apart from the above" mean? Be more explicit. ~~~ 9. +# Verify the prepared transaction are aborted $result = $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM pg_prepared_xacts;"); is($result, q(0), "prepared transaction done by worker is aborted"); /transaction are aborted/transaction was aborted/ == Response to my v7-0004 review --
Re: Slow catchup of 2PC (twophase) transactions on replica in LR
Hi, Here are some review comments for v8-0003. == src/sgml/ref/alter_subscription.sgml 1. + + The two_phase parameter can only be altered when the + subscription is disabled. When altering the parameter from on + to off, the backend process checks prepared + transactions done by the logical replication worker and aborts them. + The text may be OK as-is, but I was wondering if it might be better to give a more verbose explanation. BEFORE ... the backend process checks prepared transactions done by the logical replication worker and aborts them. SUGGESTION ... the backend process checks for any incomplete prepared transactions done by the logical replication worker (from when two_phase parameter was still "on") and, if any are found, those are aborted. == src/backend/commands/subscriptioncmds.c 2. AlterSubscription - /* - * Since the altering two_phase option of subscriptions - * also leads to the change of slot option, this command - * cannot be rolled back. So prevent we are in the - * transaction block. + * If two_phase was enabled, there is a possibility the + * transactions has already been PREPARE'd. They must be + * checked and rolled back. */ BEFORE ... there is a possibility the transactions has already been PREPARE'd. SUGGESTION ... there is a possibility that transactions have already been PREPARE'd. ~~~ 3. AlterSubscription + /* + * Since the altering two_phase option of subscriptions + * (especially on->off case) also leads to the + * change of slot option, this command cannot be rolled + * back. So prevent we are in the transaction block. + */ PreventInTransactionBlock(isTopLevel, "ALTER SUBSCRIPTION ... SET (two_phase = off)"); This comment is a bit vague and includes some typos, but IIUC these problems will already be addressed by the 0002 patch changes.AFAIK patch 0003 is only moving the 0002 comment. == src/test/subscription/t/099_twophase_added.pl 4. +# Check the case that prepared transactions exist on the subscriber node +# +# If the two_phase is altering from "on" to "off" and there are prepared +# transactions on the subscriber, they must be aborted. This test checks it. + Similar to the comment that I gave for v8-0002. I think there should be comment for the major test comment to distinguish it from comments for the sub-steps. ~~~ 5. +# Verify the prepared transaction are aborted because two_phase is changed to +# "off". +$result = $node_subscriber->safe_psql('postgres', +"SELECT count(*) FROM pg_prepared_xacts;"); +is($result, q(0), "prepared transaction done by worker is aborted"); + /the prepared transaction are aborted/any prepared transactions are aborted/ == Kind Regards, Peter Smith Fujitsu Australia
RE: Slow catchup of 2PC (twophase) transactions on replica in LR
Dear Peter, > > == > Commit message > > 1. > A detailed commit message is needed to describe the purpose and > details of this patch. Added. > == > doc/src/sgml/ref/alter_subscription.sgml > > 2. CREATE SUBSCRIPTION > > Shouldn't there be an entry for "force_alter" parameter in the CREATE > SUBSCRIPTION "parameters" section, instead of just vaguely mentioning > it in passing when describing the "two_phase" in ALTER SUBSCRIPTION? > > 3. ALTER SUBSCRIPTION - alterable parameters > > And shouldn't this new option also be named in the ALTER SUBSCRIPTION > list: "The parameters that can be altered are..." Hmm, but the parameter cannot be used for CREATE SUBSCRIPTION. Should we modify to accept and add the description in the doc? This was not accepted. > == > src/backend/commands/subscriptioncmds.c > > 4. > XLogRecPtr lsn; > + bool twophase_force; > } SubOpts; > > IMO this field ought to be called 'force_alter' to be the same as the > option name. Sure, now it is only relevant for 'two_phase', but that > might not always be the case in the future. Modified. > 5. AlterSubscription > > + /* > + * Abort prepared transactions if force option is also > + * specified. Otherwise raise an ERROR. > + */ > + if (!opts.twophase_force) > + ereport(ERROR, > + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > + errmsg("cannot alter %s when there are prepared transactions", > + "two_phase = false"))); > + > > 5a. > /if force option is also specified/only if the 'force_alter' option is true/ Modified. > > 5b. > "two_phase = false" -- IMO that should say "two_phase = off" Modified. > 5c. > IMO this ereport should include a errhint to tell the user they can > use 'force_alter = true' to avoid getting this error. Hint was added. > 6. > > + /* force_alter cannot be used standalone */ > + if (IsSet(opts.specified_opts, SUBOPT_FORCE_ALTER) && > + !IsSet(opts.specified_opts, SUBOPT_TWOPHASE_COMMIT)) > + ereport(ERROR, > + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > + errmsg("%s must be specified with %s", > + "force_alter", "two_phase"))); > + > > IMO this rule is not necessary so the code should be removed. I think > using 'force_alter' standalone doesn't do anything at all (certainly, > it does no harm) so why add more complications (more rules, more code, > more tests) just for the sake of it? Removed. So standalone 'force_alter' is now no-op. > src/test/subscription/t/099_twophase_added.pl > > 7. > +$node_subscriber->safe_psql('postgres', > +"ALTER SUBSCRIPTION sub SET (two_phase = off, force_alter = on);"); > > "force" is a verb, so it is better to say 'force_alter = true' instead > of 'force_alter = on'. Fixed. Actually not sure it is better because I'm not a native. > 8. > $result = $node_subscriber->safe_psql('postgres', > "SELECT count(*) FROM pg_prepared_xacts;"); > is($result, q(0), "prepared transaction done by worker is aborted"); > > +$node_subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub > ENABLE;"); > + > > I felt the ENABLE statement should be above the SELECT statement so > that the code is more like it was before applying the patch. Fixed. Please see attached patch set. Best Regards, Hayato Kuroda FUJITSU LIMITED https://www.fujitsu.com/ v8-0001-Allow-altering-of-two_phase-option-of-a-SUBSCRIPT.patch Description: v8-0001-Allow-altering-of-two_phase-option-of-a-SUBSCRIPT.patch v8-0002-Alter-slot-option-two_phase-only-when-altering-on.patch Description: v8-0002-Alter-slot-option-two_phase-only-when-altering-on.patch v8-0003-Abort-prepared-transactions-while-altering-two_ph.patch Description: v8-0003-Abort-prepared-transactions-while-altering-two_ph.patch v8-0004-Add-force_alter-option-for-ALTER-SUBSCIRPTION-.-S.patch Description: v8-0004-Add-force_alter-option-for-ALTER-SUBSCIRPTION-.-S.patch
RE: Slow catchup of 2PC (twophase) transactions on replica in LR
Dear Peter, > Commit Message > > 1. > The patch needs a commit message to describe the purpose and highlight > any limitations and other details. Added. > == > doc/src/sgml/ref/alter_subscription.sgml > > 2. > + > + > + The two_phase parameter can only be altered when > the > + subscription is disabled. When altering the parameter from > true > + to false, the backend process checks prepared > + transactions done by the logical replication worker and aborts them. > + > > Here, the para is referring to "true" and "false" but earlier on this > page it talks about "twophase = off". IMO it is better to use a > consistent terminology like "on|off" everywhere instead of randomly > changing the way it is described each time. I checked contents and changed to "on|off". > == > src/backend/commands/subscriptioncmds.c > > 3. AlterSubscription > > if (IsSet(opts.specified_opts, SUBOPT_TWOPHASE_COMMIT)) > { > + List *prepared_xacts = NIL; > > This 'prepared_xacts' can be declared at a lower scrope because it is > only used if (!opts.twophase). > > Furthermore, IIUC you don't need to assign NIL in the declaration > because there is no chance for it to be unassigned anyway. Made the namespace narrower and initialization was removed. > ~~~ > > 4. AlterSubscription > > + /* > + * The changed two_phase option (true->false) of the > + * slot can't be rolled back. > + */ > PreventInTransactionBlock(isTopLevel, > "ALTER SUBSCRIPTION ... SET (two_phase = off)"); > > Here is another example of inconsistent mixing of the terminology > where the comment says "true"/"false" but the message says "off". > Let's keep everything consistent. (I prefer on|off). Modified. > ~~~ > > 5. > + if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED && > + (prepared_xacts = GetGidListBySubid(subid)) != NIL) > + { > + ListCell *cell; > + > + /* Abort all listed transactions */ > + foreach(cell, prepared_xacts) > + FinishPreparedTransaction((char *) lfirst(cell), > + false); > + > + list_free(prepared_xacts); > + } > > 5A. > IIRC there is a cleaner way to write this loop without needing > ListCell variable -- e.g. foreach_ptr() macro? Changed. > 5B. > Shouldn't this be using list_free_deep() so the pstrdup gid gets freed too? Yeah, fixed. > == > src/test/subscription/t/099_twophase_added.pl > > 6. > +## > +# Check the case that prepared transactions exist on subscriber node > +## > + > > Give some more detailed comments here similar to the review comment of > patch v7-0002 for the other part of this TAP test. > > ~~~ > > 7. TAP test - comments > > Same as for my v7-0002 review comments, I think this test case also > needs a few more one-line comments to describe the sub-steps. e.g.: > > # prepare a transaction to insert some rows to the table > > # verify the prepared tx is replicated to the subscriber (because > 'two_phase = on') > > # toggle the two_phase to 'off' *before* the COMMIT PREPARED > > # verify the prepared tx got aborted > > # do the COMMIT PREPARED (note that now two_phase is 'off') > > # verify the inserted rows got replicated ok They were fixed based on your previous comments. > > 8. TAP test - subscription name > > It's better to rename the SUBSCRIPTION in this TAP test so you can > avoid getting log warnings like: > > psql::4: WARNING: subscriptions created by regression test > cases should have names starting with "regress_" > psql::4: NOTICE: created replication slot "sub" on publisher Modified, but it was included in 0001. Best Regards, Hayato Kuroda FUJITSU LIMITED https://www.fujitsu.com/
RE: Slow catchup of 2PC (twophase) transactions on replica in LR
Dear Peter, > == > Commit message > > 1. > IIUC there is quite a lot of subtlety and details about why the slot > option needs to be changed only when altering "true" to "false", but > not when altering "false" to "true". > > It also should explain why PreventInTransactionBlock is only needed > when altering two_phase "true" to "false". > > Please include a commit message to describe all those tricky details. Added. > == > src/backend/commands/subscriptioncmds.c > > 2. AlterSubscription > > - PreventInTransactionBlock(isTopLevel, "ALTER SUBSCRIPTION ... SET > (two_phase)"); > + if (!opts.twophase) > + PreventInTransactionBlock(isTopLevel, > + "ALTER SUBSCRIPTION ... SET (two_phase = off)"); > > IMO this needs a comment to explain why PreventInTransactionBlock is > only needed when changing the 'two_phase' option from on to off. Added. Thoutht? > 3. AlterSubscription > > /* > * Try to acquire the connection necessary for altering slot. > * > * This has to be at the end because otherwise if there is an error while > * doing the database operations we won't be able to rollback altered > * slot. > */ > if (replaces[Anum_pg_subscription_subfailover - 1] || > replaces[Anum_pg_subscription_subtwophasestate - 1]) > { > bool must_use_password; > char*err; > WalReceiverConn *wrconn; > bool failover_needs_to_be_updated; > bool two_phase_needs_to_be_updated; > > /* Load the library providing us libpq calls. */ > load_file("libpqwalreceiver", false); > > /* Try to connect to the publisher. */ > must_use_password = sub->passwordrequired && !sub->ownersuperuser; > wrconn = walrcv_connect(sub->conninfo, true, true, must_use_password, > sub->name, ); > if (!wrconn) > ereport(ERROR, > (errcode(ERRCODE_CONNECTION_FAILURE), > errmsg("could not connect to the publisher: %s", err))); > > /* > * Consider which slot option must be altered. > * > * We must alter the failover option whenever subfailover is updated. > * Two_phase, however, is altered only when changing true to false. > */ > failover_needs_to_be_updated = > replaces[Anum_pg_subscription_subfailover - 1]; > two_phase_needs_to_be_updated = > (replaces[Anum_pg_subscription_subtwophasestate - 1] && > !opts.twophase); > > PG_TRY(); > { > if (two_phase_needs_to_be_updated || failover_needs_to_be_updated) > walrcv_alter_slot(wrconn, sub->slotname, > failover_needs_to_be_updated ? : NULL, > two_phase_needs_to_be_updated ? : NULL); > } > PG_FINALLY(); > { > walrcv_disconnect(wrconn); > } > PG_END_TRY(); > } > > 3a. > The block comment "Consider which slot option must be altered..." says > WHEN those options need to be updated, but it doesn't say WHY. e.g. > why only update the 'two_phase' when it is being disabled but not when > it is being enabled? In other words, I think there needs to be more > background/reason details given in this comment. > > ~~~ > > 3b. > Can't those 2 new variable assignments be done up-front and guard this > entire "if-block" instead of the current replaces[] guarding it? Then > the code is somewhat simplified. > > SUGGESTION: > /* > * > */ > update_failover = replaces[Anum_pg_subscription_subfailover - 1]; > update_two_phase = (replaces[Anum_pg_subscription_subtwophasestate - > 1] && !opts.twophase); > > /* > * Try to acquire the connection necessary for altering slot. > * > * This has to be at the end because otherwise if there is an error while > * doing the database operations we won't be able to rollback altered > * slot. > */ > if (update_failover || update_two_phase) > { > ... > > /* Load the library providing us libpq calls. */ > load_file("libpqwalreceiver", false); > > /* Try to connect to the publisher. */ > must_use_password = sub->passwordrequired && !sub->ownersuperuser; > wrconn = walrcv_connect(sub->conninfo, true, true, > must_use_password, sub->name, ); > if (!wrconn) > ereport(ERROR, ...); > > PG_TRY(); > { > walrcv_alter_slot(wrconn, sub->slotname, > update_failover ? : NULL, > update_two_phase ? : NULL); > } > PG_FINALLY(); > { > walrcv_disconnect(wrconn); > } > PG_END_TRY(); > } Two variables were added and comments were updated. > .../libpqwalreceiver/libpqwalreceiver.c > > 4. > + appendStringInfo(, "ALTER_REPLICATION_SLOT %s ( ", > + quote_identifier(slotname)); > + > + if (failover) > + appendStringInfo(, "FAILOVER %s ", > + (*failover) ? "true" : "false"); > + > + if (two_phase) > + appendStringInfo(, "TWO_PHASE %s%s ", > + (*two_phase) ? "true" : "false", > + failover ? ", " : ""); > + > + appendStringInfoString(, ");"); > > 4a. > IIUC the comma logic here was broken in v7 when you swapped the order. > Anyway, IMO it will be better NOT to try combining that comma logic > with the existing appendStringInfo. Doing it separately is both easier > and less error-prone. > > Furthermore, the parentheses like "(*two_phase)" instead of just > "*two_phase" seemed a bit overkill. > > SUGGESTION: > + if (failover) > +
RE: Slow catchup of 2PC (twophase) transactions on replica in LR
Dear Peter, Thanks for reviewing! The patch will be posted in the upcoming post. > == > src/backend/access/transam/twophase.c > > 1. IsTwoPhaseTransactionGidForSubid > > +/* > + * IsTwoPhaseTransactionGidForSubid > + * Check whether the given GID is formed by TwoPhaseTransactionGid. > + */ > +static bool > +IsTwoPhaseTransactionGidForSubid(Oid subid, char *gid) > > I think the function comment should mention something about 'subid'. > > SUGGESTION > Check whether the given GID (as formed by TwoPhaseTransactionGid) is > for the specified 'subid'. Fixed. > src/backend/commands/subscriptioncmds.c > > 2. AlterSubscription > > + if (!opts.twophase && > + form->subtwophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED > && > + LookupGXactBySubid(subid)) > + /* Add error message */ > + ereport(ERROR, > + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > + errmsg("cannot disable two_phase when uncommitted prepared > transactions present"), > + errhint("Resolve these transactions and try again"))); > > The comment "/* Add error message */" seems unnecessary. Yeah, this was an internal flag. Removed. Best Regards, Hayato Kuroda FUJITSU LIMITED https://www.fujitsu.com/
Re: Slow catchup of 2PC (twophase) transactions on replica in LR
Hi, Here are some review comments for patch v7-0004 == Commit message 1. A detailed commit message is needed to describe the purpose and details of this patch. == doc/src/sgml/ref/alter_subscription.sgml 2. CREATE SUBSCRIPTION Shouldn't there be an entry for "force_alter" parameter in the CREATE SUBSCRIPTION "parameters" section, instead of just vaguely mentioning it in passing when describing the "two_phase" in ALTER SUBSCRIPTION? ~ 3. ALTER SUBSCRIPTION - alterable parameters And shouldn't this new option also be named in the ALTER SUBSCRIPTION list: "The parameters that can be altered are..." == src/backend/commands/subscriptioncmds.c 4. XLogRecPtr lsn; + bool twophase_force; } SubOpts; IMO this field ought to be called 'force_alter' to be the same as the option name. Sure, now it is only relevant for 'two_phase', but that might not always be the case in the future. ~~~ 5. AlterSubscription + /* + * Abort prepared transactions if force option is also + * specified. Otherwise raise an ERROR. + */ + if (!opts.twophase_force) + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("cannot alter %s when there are prepared transactions", + "two_phase = false"))); + 5a. /if force option is also specified/only if the 'force_alter' option is true/ ~ 5b. "two_phase = false" -- IMO that should say "two_phase = off" ~ 5c. IMO this ereport should include a errhint to tell the user they can use 'force_alter = true' to avoid getting this error. ~~~ 6. + /* force_alter cannot be used standalone */ + if (IsSet(opts.specified_opts, SUBOPT_FORCE_ALTER) && + !IsSet(opts.specified_opts, SUBOPT_TWOPHASE_COMMIT)) + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("%s must be specified with %s", + "force_alter", "two_phase"))); + IMO this rule is not necessary so the code should be removed. I think using 'force_alter' standalone doesn't do anything at all (certainly, it does no harm) so why add more complications (more rules, more code, more tests) just for the sake of it? == src/test/subscription/t/099_twophase_added.pl 7. +$node_subscriber->safe_psql('postgres', +"ALTER SUBSCRIPTION sub SET (two_phase = off, force_alter = on);"); "force" is a verb, so it is better to say 'force_alter = true' instead of 'force_alter = on'. ~~~ 8. $result = $node_subscriber->safe_psql('postgres', "SELECT count(*) FROM pg_prepared_xacts;"); is($result, q(0), "prepared transaction done by worker is aborted"); +$node_subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub ENABLE;"); + I felt the ENABLE statement should be above the SELECT statement so that the code is more like it was before applying the patch. == Kind Regards, Peter Smith. Fujitsu Australia
Re: Slow catchup of 2PC (twophase) transactions on replica in LR
Hi, Here are some review comments for the patch v7-0003. == Commit Message 1. The patch needs a commit message to describe the purpose and highlight any limitations and other details. == doc/src/sgml/ref/alter_subscription.sgml 2. + + + The two_phase parameter can only be altered when the + subscription is disabled. When altering the parameter from true + to false, the backend process checks prepared + transactions done by the logical replication worker and aborts them. + Here, the para is referring to "true" and "false" but earlier on this page it talks about "twophase = off". IMO it is better to use a consistent terminology like "on|off" everywhere instead of randomly changing the way it is described each time. == src/backend/commands/subscriptioncmds.c 3. AlterSubscription if (IsSet(opts.specified_opts, SUBOPT_TWOPHASE_COMMIT)) { + List *prepared_xacts = NIL; This 'prepared_xacts' can be declared at a lower scrope because it is only used if (!opts.twophase). Furthermore, IIUC you don't need to assign NIL in the declaration because there is no chance for it to be unassigned anyway. ~~~ 4. AlterSubscription + /* + * The changed two_phase option (true->false) of the + * slot can't be rolled back. + */ PreventInTransactionBlock(isTopLevel, "ALTER SUBSCRIPTION ... SET (two_phase = off)"); Here is another example of inconsistent mixing of the terminology where the comment says "true"/"false" but the message says "off". Let's keep everything consistent. (I prefer on|off). ~~~ 5. + if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED && + (prepared_xacts = GetGidListBySubid(subid)) != NIL) + { + ListCell *cell; + + /* Abort all listed transactions */ + foreach(cell, prepared_xacts) + FinishPreparedTransaction((char *) lfirst(cell), + false); + + list_free(prepared_xacts); + } 5A. IIRC there is a cleaner way to write this loop without needing ListCell variable -- e.g. foreach_ptr() macro? ~ 5B. Shouldn't this be using list_free_deep() so the pstrdup gid gets freed too? == src/test/subscription/t/099_twophase_added.pl 6. +## +# Check the case that prepared transactions exist on subscriber node +## + Give some more detailed comments here similar to the review comment of patch v7-0002 for the other part of this TAP test. ~~~ 7. TAP test - comments Same as for my v7-0002 review comments, I think this test case also needs a few more one-line comments to describe the sub-steps. e.g.: # prepare a transaction to insert some rows to the table # verify the prepared tx is replicated to the subscriber (because 'two_phase = on') # toggle the two_phase to 'off' *before* the COMMIT PREPARED # verify the prepared tx got aborted # do the COMMIT PREPARED (note that now two_phase is 'off') # verify the inserted rows got replicated ok ~~~ 8. TAP test - subscription name It's better to rename the SUBSCRIPTION in this TAP test so you can avoid getting log warnings like: psql::4: WARNING: subscriptions created by regression test cases should have names starting with "regress_" psql::4: NOTICE: created replication slot "sub" on publisher == Kind Regards, Peter Smith. Fujitsu Australia
Re: Slow catchup of 2PC (twophase) transactions on replica in LR
Hi, Here are some review comments for v7-0002 == Commit message 1. IIUC there is quite a lot of subtlety and details about why the slot option needs to be changed only when altering "true" to "false", but not when altering "false" to "true". It also should explain why PreventInTransactionBlock is only needed when altering two_phase "true" to "false". Please include a commit message to describe all those tricky details. == src/backend/commands/subscriptioncmds.c 2. AlterSubscription - PreventInTransactionBlock(isTopLevel, "ALTER SUBSCRIPTION ... SET (two_phase)"); + if (!opts.twophase) + PreventInTransactionBlock(isTopLevel, + "ALTER SUBSCRIPTION ... SET (two_phase = off)"); IMO this needs a comment to explain why PreventInTransactionBlock is only needed when changing the 'two_phase' option from on to off. ~~~ 3. AlterSubscription /* * Try to acquire the connection necessary for altering slot. * * This has to be at the end because otherwise if there is an error while * doing the database operations we won't be able to rollback altered * slot. */ if (replaces[Anum_pg_subscription_subfailover - 1] || replaces[Anum_pg_subscription_subtwophasestate - 1]) { bool must_use_password; char*err; WalReceiverConn *wrconn; bool failover_needs_to_be_updated; bool two_phase_needs_to_be_updated; /* Load the library providing us libpq calls. */ load_file("libpqwalreceiver", false); /* Try to connect to the publisher. */ must_use_password = sub->passwordrequired && !sub->ownersuperuser; wrconn = walrcv_connect(sub->conninfo, true, true, must_use_password, sub->name, ); if (!wrconn) ereport(ERROR, (errcode(ERRCODE_CONNECTION_FAILURE), errmsg("could not connect to the publisher: %s", err))); /* * Consider which slot option must be altered. * * We must alter the failover option whenever subfailover is updated. * Two_phase, however, is altered only when changing true to false. */ failover_needs_to_be_updated = replaces[Anum_pg_subscription_subfailover - 1]; two_phase_needs_to_be_updated = (replaces[Anum_pg_subscription_subtwophasestate - 1] && !opts.twophase); PG_TRY(); { if (two_phase_needs_to_be_updated || failover_needs_to_be_updated) walrcv_alter_slot(wrconn, sub->slotname, failover_needs_to_be_updated ? : NULL, two_phase_needs_to_be_updated ? : NULL); } PG_FINALLY(); { walrcv_disconnect(wrconn); } PG_END_TRY(); } 3a. The block comment "Consider which slot option must be altered..." says WHEN those options need to be updated, but it doesn't say WHY. e.g. why only update the 'two_phase' when it is being disabled but not when it is being enabled? In other words, I think there needs to be more background/reason details given in this comment. ~~~ 3b. Can't those 2 new variable assignments be done up-front and guard this entire "if-block" instead of the current replaces[] guarding it? Then the code is somewhat simplified. SUGGESTION: /* * */ update_failover = replaces[Anum_pg_subscription_subfailover - 1]; update_two_phase = (replaces[Anum_pg_subscription_subtwophasestate - 1] && !opts.twophase); /* * Try to acquire the connection necessary for altering slot. * * This has to be at the end because otherwise if there is an error while * doing the database operations we won't be able to rollback altered * slot. */ if (update_failover || update_two_phase) { ... /* Load the library providing us libpq calls. */ load_file("libpqwalreceiver", false); /* Try to connect to the publisher. */ must_use_password = sub->passwordrequired && !sub->ownersuperuser; wrconn = walrcv_connect(sub->conninfo, true, true, must_use_password, sub->name, ); if (!wrconn) ereport(ERROR, ...); PG_TRY(); { walrcv_alter_slot(wrconn, sub->slotname, update_failover ? : NULL, update_two_phase ? : NULL); } PG_FINALLY(); { walrcv_disconnect(wrconn); } PG_END_TRY(); } == .../libpqwalreceiver/libpqwalreceiver.c 4. + appendStringInfo(, "ALTER_REPLICATION_SLOT %s ( ", + quote_identifier(slotname)); + + if (failover) + appendStringInfo(, "FAILOVER %s ", + (*failover) ? "true" : "false"); + + if (two_phase) + appendStringInfo(, "TWO_PHASE %s%s ", + (*two_phase) ? "true" : "false", + failover ? ", " : ""); + + appendStringInfoString(, ");"); 4a. IIUC the comma logic here was broken in v7 when you swapped the order. Anyway, IMO it will be better NOT to try combining that comma logic with the existing appendStringInfo. Doing it separately is both easier and less error-prone. Furthermore, the parentheses like "(*two_phase)" instead of just "*two_phase" seemed a bit overkill. SUGGESTION: + if (failover) + appendStringInfo(, "FAILOVER %s", + *failover ? "true" : "false"); + + if (failover && two_phase) + appendStringInfo(, ", "); + + if (two_phase) + appendStringInfo(, "TWO_PHASE %s", + *two_phase ? "true" : "false"); + + appendStringInfoString(, " );"); ~~ 4b. Like I said above, IMO the current separator logic in v7 is broken. So it is a bit concerning
Re: Slow catchup of 2PC (twophase) transactions on replica in LR
Hi Kuroda-san, Thanks for addressing most of my v6-0001 review comments. Below are some minor follow-up comments for v7-0001. == src/backend/access/transam/twophase.c 1. IsTwoPhaseTransactionGidForSubid +/* + * IsTwoPhaseTransactionGidForSubid + * Check whether the given GID is formed by TwoPhaseTransactionGid. + */ +static bool +IsTwoPhaseTransactionGidForSubid(Oid subid, char *gid) I think the function comment should mention something about 'subid'. SUGGESTION Check whether the given GID (as formed by TwoPhaseTransactionGid) is for the specified 'subid'. == src/backend/commands/subscriptioncmds.c 2. AlterSubscription + if (!opts.twophase && + form->subtwophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED && + LookupGXactBySubid(subid)) + /* Add error message */ + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("cannot disable two_phase when uncommitted prepared transactions present"), + errhint("Resolve these transactions and try again"))); The comment "/* Add error message */" seems unnecessary. == Kind Regards, Peter Smith. Fujitsu Australia
RE: Slow catchup of 2PC (twophase) transactions on replica in LR
Dear Peter, Thanks for reviewing! Here are updated patches. I updated patches only for HEAD. > == > Commit message > > 1. > This patch allows user to alter two_phase option > > /allows user/allows the user/ > > /to alter two_phase option/to alter the 'two_phase' option/ Fixed. > == > doc/src/sgml/ref/alter_subscription.sgml > > 2. > two_phase can be altered only for disabled subscription. > > SUGGEST > The two_phase parameter can only be altered when > the subscription is disabled. Fixed. > == > src/backend/access/transam/twophase.c > > 3. checkGid > + > +/* > + * checkGid > + */ > +static bool > +checkGid(char *gid, Oid subid) > +{ > + int ret; > + Oid subid_written, > + xid; > + > + ret = sscanf(gid, "pg_gid_%u_%u", _written, ); > + > + if (ret != 2 || subid != subid_written) > + return false; > + > + return true; > +} > > 3a. > The function comment should give more explanation of what it does. I > think this function is the counterpart of the TwoPhaseTransactionGid() > function of worker.c so the comment can say that too. Comments were updated. > 3b. > Indeed, perhaps the function name should be similar to > TwoPhaseTransactionGid. e.g. call it like > IsTwoPhaseTransactionGidForSubid? Replaced to IsTwoPhaseTransactionGidForSubid(). > 3c. > Probably 'xid' should be TransactionId instead of Oid. Right, fixed. > 3d. > Why not have a single return? > > SUGGESTION > return (ret == 2 && subid = subid_written); Fixed. > 3e. > I am wondering if the existing TwoPhaseTransactionGid function > currently in worker.c should be moved here because IMO these 2 > functions belong together and twophase.c seems the right place to put > them. +1, moved. > ~~~ > > 4. > +LookupGXactBySubid(Oid subid) > +{ > + bool found = false; > + > + LWLockAcquire(TwoPhaseStateLock, LW_SHARED); > + for (int i = 0; i < TwoPhaseState->numPrepXacts; i++) > + { > + GlobalTransaction gxact = TwoPhaseState->prepXacts[i]; > + > + /* Ignore not-yet-valid GIDs. */ > + if (gxact->valid && checkGid(gxact->gid, subid)) > + { > + found = true; > + break; > + } > + } > + LWLockRelease(TwoPhaseStateLock); > + return found; > +} > > AFAIK the gxact also has the 'xid' available, so won't it be better to > pass BOTH the 'xid' and 'subid' to the checkGid so you can do a full > comparison instead of comparing only the subid part of the gid? IIUC, the xid written in the gxact means the transaction id on the subscriber, but formatted GID has xid on the publisher. So the value cannot be used. > == > src/backend/commands/subscriptioncmds.c > > 5. AlterSubscription > > + /* XXX */ > + if (IsSet(opts.specified_opts, SUBOPT_TWOPHASE_COMMIT)) > + { > > The "XXX" comment looks like it is meant to say something more... This flag was used only for me, removed. > ~~~ > > 6. > + /* > + * two_phase can be only changed for disabled > + * subscriptions > + */ > + if (form->subenabled) > + ereport(ERROR, > + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > + errmsg("cannot set %s for enabled subscription", > + "two_phase"))); > > 6a. > Should this have a more comprehensive comment giving the reason like > the 'failover' option has? Modified, but it is almost the same as failover's one. > 6b. > Maybe this should include a "translator" comment to say don't > translate the option name. Hmm, but other parts in AlterSubscription() does not have. For now, I kept current style. > ~~~ > > 7. > + /* Check whether the number of prepared transactions */ > + if (!opts.twophase && > + form->subtwophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED > && > + LookupGXactBySubid(subid)) > + ereport(ERROR, > + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > + errmsg("cannot disable two_phase when uncommitted prepared > transactions present"))); > + > > 7a. > The first comment seems to be an incomplete sentence. I think it > should say something a bit like: > two_phase cannot be disabled if there are any uncommitted prepared > transactions present. Modified, but this part would be replaced by upcoming patches. > 7b. > Also, if ereport occurs what is the user supposed to do about it? > Shouldn't the ereport include some errhint with appropriate advice? The hint was added, but this part would be replaced by upcoming patches. > ~~~ > > 8. > + /* > + * The changed failover option of the slot can't be rolled > + * back. > + */ > + PreventInTransactionBlock(isTopLevel, "ALTER SUBSCRIPTION ... SET > (two_phase)"); > + > + /* Change system catalog acoordingly */ > + values[Anum_pg_subscription_subtwophasestate - 1] = > + CharGetDatum(opts.twophase ? > + LOGICALREP_TWOPHASE_STATE_PENDING : > + LOGICALREP_TWOPHASE_STATE_DISABLED); > + replaces[Anum_pg_subscription_subtwophasestate - 1] = true; > + } > > Typo I think: /failover option/two_phase option/ Right, fixed. > == > .../libpqwalreceiver/libpqwalreceiver.c > > 9. > static void > libpqrcv_alter_slot(WalReceiverConn *conn, const char *slotname, > - bool failover) > + bool
Re: Slow catchup of 2PC (twophase) transactions on replica in LR
Hi, Here are some review comments for patch v6-0001 == Commit message 1. This patch allows user to alter two_phase option /allows user/allows the user/ /to alter two_phase option/to alter the 'two_phase' option/ == doc/src/sgml/ref/alter_subscription.sgml 2. two_phase can be altered only for disabled subscription. SUGGEST The two_phase parameter can only be altered when the subscription is disabled. == src/backend/access/transam/twophase.c 3. checkGid + +/* + * checkGid + */ +static bool +checkGid(char *gid, Oid subid) +{ + int ret; + Oid subid_written, + xid; + + ret = sscanf(gid, "pg_gid_%u_%u", _written, ); + + if (ret != 2 || subid != subid_written) + return false; + + return true; +} 3a. The function comment should give more explanation of what it does. I think this function is the counterpart of the TwoPhaseTransactionGid() function of worker.c so the comment can say that too. ~ 3b. Indeed, perhaps the function name should be similar to TwoPhaseTransactionGid. e.g. call it like IsTwoPhaseTransactionGidForSubid? ~ 3c. Probably 'xid' should be TransactionId instead of Oid. ~ 3d. Why not have a single return? SUGGESTION return (ret == 2 && subid = subid_written); ~ 3e. I am wondering if the existing TwoPhaseTransactionGid function currently in worker.c should be moved here because IMO these 2 functions belong together and twophase.c seems the right place to put them. ~~~ 4. +LookupGXactBySubid(Oid subid) +{ + bool found = false; + + LWLockAcquire(TwoPhaseStateLock, LW_SHARED); + for (int i = 0; i < TwoPhaseState->numPrepXacts; i++) + { + GlobalTransaction gxact = TwoPhaseState->prepXacts[i]; + + /* Ignore not-yet-valid GIDs. */ + if (gxact->valid && checkGid(gxact->gid, subid)) + { + found = true; + break; + } + } + LWLockRelease(TwoPhaseStateLock); + return found; +} AFAIK the gxact also has the 'xid' available, so won't it be better to pass BOTH the 'xid' and 'subid' to the checkGid so you can do a full comparison instead of comparing only the subid part of the gid? == src/backend/commands/subscriptioncmds.c 5. AlterSubscription + /* XXX */ + if (IsSet(opts.specified_opts, SUBOPT_TWOPHASE_COMMIT)) + { The "XXX" comment looks like it is meant to say something more... ~~~ 6. + /* + * two_phase can be only changed for disabled + * subscriptions + */ + if (form->subenabled) + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("cannot set %s for enabled subscription", + "two_phase"))); 6a. Should this have a more comprehensive comment giving the reason like the 'failover' option has? ~~~ 6b. Maybe this should include a "translator" comment to say don't translate the option name. ~~~ 7. + /* Check whether the number of prepared transactions */ + if (!opts.twophase && + form->subtwophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED && + LookupGXactBySubid(subid)) + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("cannot disable two_phase when uncommitted prepared transactions present"))); + 7a. The first comment seems to be an incomplete sentence. I think it should say something a bit like: two_phase cannot be disabled if there are any uncommitted prepared transactions present. ~ 7b. Also, if ereport occurs what is the user supposed to do about it? Shouldn't the ereport include some errhint with appropriate advice? ~~~ 8. + /* + * The changed failover option of the slot can't be rolled + * back. + */ + PreventInTransactionBlock(isTopLevel, "ALTER SUBSCRIPTION ... SET (two_phase)"); + + /* Change system catalog acoordingly */ + values[Anum_pg_subscription_subtwophasestate - 1] = + CharGetDatum(opts.twophase ? + LOGICALREP_TWOPHASE_STATE_PENDING : + LOGICALREP_TWOPHASE_STATE_DISABLED); + replaces[Anum_pg_subscription_subtwophasestate - 1] = true; + } Typo I think: /failover option/two_phase option/ == .../libpqwalreceiver/libpqwalreceiver.c 9. static void libpqrcv_alter_slot(WalReceiverConn *conn, const char *slotname, - bool failover) + bool two_phase, bool failover) Same comment as mentioned elsewhere (#15), IMO the new 'two_phase' parameter should be last. == src/backend/replication/logical/launcher.c 10. +/* + * Stop all the subscription workers. + */ +void +logicalrep_workers_stop(Oid subid) +{ + List*subworkers; + ListCell *lc; + + LWLockAcquire(LogicalRepWorkerLock, LW_SHARED); + subworkers = logicalrep_workers_find(subid, false); + LWLockRelease(LogicalRepWorkerLock); + foreach(lc, subworkers) + { + LogicalRepWorker *w = (LogicalRepWorker *) lfirst(lc); + + logicalrep_worker_stop(w->subid, w->relid); + } + list_free(subworkers); +} I was confused by the logicalrep_workers_find(subid, false). IIUC the 'false' means everything (instead of 'only_running') but then I don't know why we want to "stop" anything that is NOT running. OTOH I see that this code was extracted from where it was previously inlined in subscriptioncmds.c, so maybe the 'false' is necessary for another
RE: Slow catchup of 2PC (twophase) transactions on replica in LR
Dear hackers, Per recent commit (b29cbd3da), our patch needed to be rebased. Here is an updated version. Best Regards, Hayato Kuroda FUJITSU LIMITED https://www.fujitsu.com/ v6-0001-Allow-altering-of-two_phase-option-of-a-SUBSCRIPT.patch Description: v6-0001-Allow-altering-of-two_phase-option-of-a-SUBSCRIPT.patch v6-0002-Alter-slot-option-two_phase-only-when-altering-tr.patch Description: v6-0002-Alter-slot-option-two_phase-only-when-altering-tr.patch v6-0003-Abort-prepared-transactions-while-altering-two_ph.patch Description: v6-0003-Abort-prepared-transactions-while-altering-two_ph.patch v6-0004-Add-force_alter-option.patch Description: v6-0004-Add-force_alter-option.patch
RE: Slow catchup of 2PC (twophase) transactions on replica in LR
Dear Hayato, On Monday, April 22, 2024 15:54 MSK, "Hayato Kuroda (Fujitsu)" wrote: > Dear Vitaly, > > > I looked at the patch and realized that I can't try it easily in the near > > future > > because the solution I'm working on is based on PG16 or earlier. This patch > > is > > not easily applicable to the older releases. I have to port my solution to > > the > > master, which is not done yet. > > We also tried to port our patch for PG16, but the largest barrier was that a > replication command ALTER_SLOT is not supported. Since the slot option > two_phase > can't be modified, it is difficult to skip decoding PREPARE command even when > altering the option from true to false. > IIUC, Adding a new feature (e.g., replication command) for minor updates is > generally > prohibited > ... Attached patch set is a ported version for PG16, which breaks ABI. This can be used for testing purpose, but it won't be pushed to REL_16_STABLE. At least, this patchset can pass my github CI. Can you apply and check whether your issue is solved?It is fantastic. Thank you for your help! I will definitely try your patch. I need some time to test and incorporate it. I also plan to port my stuff to the master branch to simplify testing of patches. With best regards, Vitaly Davydov
RE: Slow catchup of 2PC (twophase) transactions on replica in LR
> Dear Vitaly, > > > I looked at the patch and realized that I can't try it easily in the near > > future > > because the solution I'm working on is based on PG16 or earlier. This patch > > is > > not easily applicable to the older releases. I have to port my solution to > > the > > master, which is not done yet. > > We also tried to port our patch for PG16, but the largest barrier was that a > replication command ALTER_SLOT is not supported. Since the slot option > two_phase > can't be modified, it is difficult to skip decoding PREPARE command even when > altering the option from true to false. > IIUC, Adding a new feature (e.g., replication command) for minor updates is > generally > prohibited > > We must consider another approach for backpatching, but we do not have > solutions > for now. Attached patch set is a ported version for PG16, which breaks ABI. This can be used for testing purpose, but it won't be pushed to REL_16_STABLE. At least, this patchset can pass my github CI. Can you apply and check whether your issue is solved? Best Regards, Hayato Kuroda FUJITSU LIMITED https://www.fujitsu.com/ REL_16_0001-Allow-altering-of-two_phase-option-of-a-SUBSCRIPTION.patch Description: REL_16_0001-Allow-altering-of-two_phase-option-of-a-SUBSCRIPTION.patch REL_16_0002-Alter-slot-option-two_phase-only-when-altering-true-.patch Description: REL_16_0002-Alter-slot-option-two_phase-only-when-altering-true-.patch REL_16_0003-Abort-prepared-transactions-while-altering-two_phase.patch Description: REL_16_0003-Abort-prepared-transactions-while-altering-two_phase.patch REL_16_0004-Add-force_alter-option.patch Description: REL_16_0004-Add-force_alter-option.patch
RE: Slow catchup of 2PC (twophase) transactions on replica in LR
Dear hackers, > Looking at this a bit more, maybe rolling back all prepared transactions on > the > subscriber when toggling two_phase from true to false might not be desirable > for the customer. Maybe we should have an option for customers to control > whether transactions should be rolled back or not. Maybe transactions should > only be rolled back if a "force" option is also set. What do people think? And here is a patch for adds new option "force_alter" (better name is very welcome). It could be used only when altering two_phase option. Let me share examples. Assuming that there are logical replication system with two_phase = on, and there are prepared transactions: ``` subscriber=# SELECT * FROM pg_prepared_xacts ; transaction | gid| prepared| owner | database -+--+---+--+-- 741 | pg_gid_16390_741 | 2024-04-22 08:02:34.727913+00 | postgres | postgres 742 | pg_gid_16390_742 | 2024-04-22 08:02:34.729486+00 | postgres | postgres (2 rows) ``` At that time, altering two_phase to false alone will be failed: ``` subscriber=# ALTER SUBSCRIPTION sub DISABLE ; ALTER SUBSCRIPTION subscriber=# ALTER SUBSCRIPTION sub SET (two_phase = off); ERROR: cannot alter two_phase = false when there are prepared transactions ``` It succeeds if force_alter is also expressly set. Prepared transactions will be aborted at that time. ``` subscriber=# ALTER SUBSCRIPTION sub SET (two_phase = off, force_alter = on); ALTER SUBSCRIPTION subscriber=# SELECT * FROM pg_prepared_xacts ; transaction | gid | prepared | owner | database -+-+--+---+-- (0 rows) ``` Best Regards, Hayato Kuroda FUJITSU LIMITED https://www.fujitsu.com/global/ v5-0001-Allow-altering-of-two_phase-option-of-a-SUBSCRIPT.patch Description: v5-0001-Allow-altering-of-two_phase-option-of-a-SUBSCRIPT.patch v5-0002-Alter-slot-option-two_phase-only-when-altering-tr.patch Description: v5-0002-Alter-slot-option-two_phase-only-when-altering-tr.patch v5-0003-Abort-prepared-transactions-while-altering-two_ph.patch Description: v5-0003-Abort-prepared-transactions-while-altering-two_ph.patch v5-0004-Add-force_alter-option.patch Description: v5-0004-Add-force_alter-option.patch
RE: Slow catchup of 2PC (twophase) transactions on replica in LR
Dear Vitaly, > I looked at the patch and realized that I can't try it easily in the near > future > because the solution I'm working on is based on PG16 or earlier. This patch is > not easily applicable to the older releases. I have to port my solution to the > master, which is not done yet. We also tried to port our patch for PG16, but the largest barrier was that a replication command ALTER_SLOT is not supported. Since the slot option two_phase can't be modified, it is difficult to skip decoding PREPARE command even when altering the option from true to false. IIUC, Adding a new feature (e.g., replication command) for minor updates is generally prohibited We must consider another approach for backpatching, but we do not have solutions for now. Best Regards, Hayato Kuroda FUJITSU LIMITED https://www.fujitsu.com/global/
Re: Slow catchup of 2PC (twophase) transactions on replica in LR
On Thu, Apr 18, 2024 at 4:26 PM Ajin Cherian wrote: > > Attaching the patch for your review and comments. Big thanks to Kuroda-san > for also working on the patch. > > Looking at this a bit more, maybe rolling back all prepared transactions on the subscriber when toggling two_phase from true to false might not be desirable for the customer. Maybe we should have an option for customers to control whether transactions should be rolled back or not. Maybe transactions should only be rolled back if a "force" option is also set. What do people think? regards, Ajin Cherian Fujitsu Australia
Re: Slow catchup of 2PC (twophase) transactions on replica in LR
On Tue, Apr 16, 2024 at 4:25 PM Amit Kapila wrote: > > > > > Can you please once consider the idea shared by me at [1] (One naive > idea is that on the publisher .) to solve this problem? > > [1] - > https://www.postgresql.org/message-id/CAA4eK1K1fSkeK%3Dkc26G5cq87vQG4%3D1qs_b%2Bno4%2Bep654SeBy1w%40mail.gmail.com > > > Expanding on Amit's idea, we found out that there is already a mechanism in code to fully decode prepared transactions prior to a defined LSN where two_phase is enabled using the "two_phase_at" LSN in the slot. Look at ReorderBufferFinishPrepared() on how this is done. This code was not working as expected in our patch because we were setting two_phase on the slot to true as soon as the alter command was received. This was not the correct way, initially when two_phase is enabled, the two_phase changes to pending state and two_phase option on the slot should only be set to true when two_phase moves from pending to enabled. This will happen once the replication is restarted with two_phase option. Look at code in CreateDecodingContext() on how "two_phase_at" is set in the slot when done this way. So we changed the code to not remotely alter two_phase when toggling from false to true. With this change, now even if there are pending transactions on the publisher when toggling two_phase from false to true, these pending transactions will be fully decoded and sent once the commit prepared is decoded as the pending prepared transactions are prior to the "two_phase_at" LSN. With this patch, now we are able to handle both pending prepared transactions when altering two_phase from true to false as well as false to true. Attaching the patch for your review and comments. Big thanks to Kuroda-san for also working on the patch. regards, Ajin Cherian Fujitsu Australia. v4-0002-Alter-slot-option-two_phase-only-when-altering-tr.patch Description: Binary data v4-0003-Abort-prepared-transactions-while-altering-two_ph.patch Description: Binary data v4-0001-Allow-altering-of-two_phase-option-of-a-SUBSCRIPT.patch Description: Binary data
Re: Slow catchup of 2PC (twophase) transactions on replica in LR
On Tue, Apr 16, 2024 at 1:31 AM Давыдов Виталий wrote: > Dear All, > Just interested, does anyone tried to reproduce the problem with slow > catchup of twophase transactions (pgbench should be used with big number of > clients)? I haven't seen any messages from anyone other that me that the > problem takes place. > > > Yes, I was able to reproduce the slow catchup of twophase transactions with pgbench with 20 clients. regards, Ajin Cherian Fujitsu Australia
Re: Slow catchup of 2PC (twophase) transactions on replica in LR
On Tue, Apr 16, 2024 at 7:48 AM Hayato Kuroda (Fujitsu) wrote: > > > > FYI - We also considered the idea which walsender waits until all prepared > > transactions > > > are resolved before decoding and sending changes, but it did not work well > > > - the restarted walsender sent only COMMIT PREPARED record for > > transactions which > > > have been prepared before disabling the subscription. This happened > > > because > > > 1) if the two_phase option of slots is false, the confirmed_flush can be > > > ahead of > > >PREPARE record, and > > > 2) after the altering and restarting, start_decoding_at becomes same as > > >confirmed_flush and records behind this won't be decoded. > > > > > > > I don't understand the exact problem you are facing. IIUC, if the > > commit is after start_decoding_at point and prepare was before it, we > > expect to send the entire transaction followed by a commit record. The > > restart_lsn should be before the start of such a transaction and we > > should have recorded the changes in the reorder buffer. > > This behavior is right for two_phase = false case. But if the parameter is > altered between PREPARE and COMMIT PREPARED, there is a possibility that only > COMMIT PREPARED is sent. > Can you please once consider the idea shared by me at [1] (One naive idea is that on the publisher .) to solve this problem? [1] - https://www.postgresql.org/message-id/CAA4eK1K1fSkeK%3Dkc26G5cq87vQG4%3D1qs_b%2Bno4%2Bep654SeBy1w%40mail.gmail.com -- With Regards, Amit Kapila.
RE: Slow catchup of 2PC (twophase) transactions on replica in LR
Dear Amit, > > FYI - We also considered the idea which walsender waits until all prepared > transactions > > are resolved before decoding and sending changes, but it did not work well > > - the restarted walsender sent only COMMIT PREPARED record for > transactions which > > have been prepared before disabling the subscription. This happened because > > 1) if the two_phase option of slots is false, the confirmed_flush can be > > ahead of > >PREPARE record, and > > 2) after the altering and restarting, start_decoding_at becomes same as > >confirmed_flush and records behind this won't be decoded. > > > > I don't understand the exact problem you are facing. IIUC, if the > commit is after start_decoding_at point and prepare was before it, we > expect to send the entire transaction followed by a commit record. The > restart_lsn should be before the start of such a transaction and we > should have recorded the changes in the reorder buffer. This behavior is right for two_phase = false case. But if the parameter is altered between PREPARE and COMMIT PREPARED, there is a possibility that only COMMIT PREPARED is sent. As the first place, the executed workload is below. 1. created a subscription with (two_phase = false) 2. prepared a transaction on publisher 3. disabled the subscription once 4. altered the subscription to two_phase = true 5. enabled the subscription again 6. did COMMIT PREPARED on the publisher -> Apply worker would raise an ERROR while applying COMMIT PREPARED record: ERROR: prepared transaction with identifier "pg_gid_XXX_YYY" does not exist Below part describes why the ERROR occurred. == ### Regarding 1) the confirmed_flush can be ahead of PREPARE record, If two_phase is off, as you might know, confirmed_flush can be ahead of PREPARE record by keepalive mechanism. Walsender sometimes sends a keepalive message in WalSndKeepalive(). Here the LSN is written, which is lastly decoded record. Since the PREPARE record is skipped (just handled by ReorderBufferProcessXid()), sometimes the written LSN in the message can be ahead of PREPARE record. If the WAL records are aligned like below, the LSN can point CHECKPOINT_ONLINE. ... INSERT PREPARE txn1 CHECKPOINT_ONLINE ... On worker side, when it receives the keepalive, it compares the LSN in the message and lastly received LSN, and advance last_received. Then, the worker replies to the walsender, and at that time it replies that last_recevied record has been flushed on the subscriber. See send_feedback(). On publisher, when the walsender receives the message from subscriber, it reads the message and advance the confirmed_flush to the written value. If the walsender sends LSN which locates ahead PREPARE, the confirmed flush is updated as well. ### Regarding 2) after the altering, records behind the confirmed_flush are not decoded Then, at decoding phase. The snapshot builder determines the point where decoding is resumed, as start_decoding_at. After the restart, the value is same as confirmed_flush of the slot. Since the confiremed_fluish is ahead of PREPARE, the start_decoding_at becomes ahead as well, so whole of prepared transactions are not decoded. == Attached zip file contains the PoC and used script. You can refer what I really did. Best Regards, Hayato Kuroda FUJITSU LIMITED https://www.fujitsu.com/ <>
Re: Slow catchup of 2PC (twophase) transactions on replica in LR
Dear All, On Wednesday, April 10, 2024 17:16 MSK, Давыдов Виталий wrote: Hi Amit, Ajin, All Thank you for the patch and the responses. I apologize for my delayed answer due to some curcumstances. On Wednesday, April 10, 2024 14:18 MSK, Amit Kapila wrote: Vitaly, does the minimal solution provided by the proposed patch (Allow to alter two_phase option of a subscriber provided no uncommitted prepared transactions are pending on that subscription.) address your use case?In general, the idea behind the patch seems to be suitable for my case. Furthermore, the case of two_phase switch from false to true with uncommitted pending prepared transactions probably never happens in my case. The switch from false to true means that the replica completes the catchup from the master and switches to the normal mode when it participates in the multi-node configuration. There should be no uncommitted pending prepared transactions at the moment of the switch to the normal mode. I'm going to try this patch. Give me please some time to investigate the patch. I will come with some feedback a little bit later. I looked at the patch and realized that I can't try it easily in the near future because the solution I'm working on is based on PG16 or earlier. This patch is not easily applicable to the older releases. I have to port my solution to the master, which is not done yet. I apologize for that - so much work should be done before applying the patch. BTW, I tested the idea with async 2PC commit on my side and it seems to work fine in my case. Anyway, I agree, the idea with altering of subscription seems the best one but much harder to implement. To summarise my case of a synchronous multimaster where twophase is used to implement global transactions: * Replica may have prepared but not committed transactions when I toggle subscription twophase from true to false. In this case, all prepared transactions may be aborted on the replica before altering the subscription. * Replica will not have prepared transactions when subscription is toggled from false to true. In this scenario, the replica completes the catchup (with twophase=off) and becomes the part of the multi-nodal cluster and is ready to accept new 2PC transactions. All the new pending transactions will wait until replica responds. But it may work differently for some other solutions. In general, it would be great to allow toggling for all scenarious.Just interested, does anyone tried to reproduce the problem with slow catchup of twophase transactions (pgbench should be used with big number of clients)? I haven't seen any messages from anyone other that me that the problem takes place. Thank you for your help! With best regards, Vitaly
Re: Slow catchup of 2PC (twophase) transactions on replica in LR
On Mon, Apr 15, 2024 at 1:28 PM Hayato Kuroda (Fujitsu) wrote: > > > Vitaly, does the minimal solution provided by the proposed patch > > (Allow to alter two_phase option of a subscriber provided no > > uncommitted > > prepared transactions are pending on that subscription.) address your use > > case? > > I think we do not have to handle cases which there are prepared transactions > on > publisher/subscriber, as the first step. It leads additional complexity and we > do not have smarter solutions, especially for problem 2. > IIUC it meets the Vitaly's condition, right? > > > > 1. While toggling two_phase from true to false, we could probably get a > > > list of > > prepared transactions for this subscriber id and rollback/abort the prepared > > transactions. This will allow the transactions to be re-applied like a > > normal > > transaction when the commit comes. Alternatively, if this isn't appropriate > > doing it > > in the ALTER SUBSCRIPTION context, we could store the xids of all prepared > > transactions of this subscription in a list and when the corresponding xid > > is being > > committed by the apply worker, prior to commit, we make sure the previously > > prepared transaction is rolled back. But this would add the overhead of > > checking > > this list every time a transaction is committed by the apply worker. > > > > > > > In the second solution, if you check at the time of commit whether > > there exists a prior prepared transaction then won't we end up > > applying the changes twice? I think we can first try to achieve it at > > the time of Alter Subscription because the other solution can add > > overhead at each commit? > > Yeah, at least the second solution might be problematic. I prototyped > the first one and worked well. However, to make the feature more consistent, > it is prohibit to exist prepared transactions on subscriber for now. > We can ease based on the requirement. > > > > 2. No solution yet. > > > > > > > One naive idea is that on the publisher we can remember whether the > > prepare has been sent and if so then only send commit_prepared, > > otherwise send the entire transaction. On the subscriber-side, we > > somehow, need to ensure before applying the first change whether the > > corresponding transaction is already prepared and if so then skip the > > changes and just perform the commit prepared. One drawback of this > > approach is that after restart, the prepare flag wouldn't be saved in > > the memory and we end up sending the entire transaction again. One way > > to avoid this overhead is that the publisher before sending the entire > > transaction checks with subscriber whether it has a prepared > > transaction corresponding to the current commit. I understand that > > this is not a good idea even if it works but I don't have any better > > ideas. What do you think? > > I considered but not sure it is good to add such mechanism. Your idea requires > additional wait-loop, which might lead bugs and unexpected behavior. And it > may > degrade the performance based on the network environment. > As for the another solution (worker sends a list of prepared transactions), it > is also not so good because list of prepared transactions may be huge. > > Based on above, I think we can reject the case for now. > > FYI - We also considered the idea which walsender waits until all prepared > transactions > are resolved before decoding and sending changes, but it did not work well > - the restarted walsender sent only COMMIT PREPARED record for transactions > which > have been prepared before disabling the subscription. This happened because > 1) if the two_phase option of slots is false, the confirmed_flush can be > ahead of >PREPARE record, and > 2) after the altering and restarting, start_decoding_at becomes same as >confirmed_flush and records behind this won't be decoded. > I don't understand the exact problem you are facing. IIUC, if the commit is after start_decoding_at point and prepare was before it, we expect to send the entire transaction followed by a commit record. The restart_lsn should be before the start of such a transaction and we should have recorded the changes in the reorder buffer. -- With Regards, Amit Kapila.
RE: Slow catchup of 2PC (twophase) transactions on replica in LR
Dear Amit, > Vitaly, does the minimal solution provided by the proposed patch > (Allow to alter two_phase option of a subscriber provided no > uncommitted > prepared transactions are pending on that subscription.) address your use > case? I think we do not have to handle cases which there are prepared transactions on publisher/subscriber, as the first step. It leads additional complexity and we do not have smarter solutions, especially for problem 2. IIUC it meets the Vitaly's condition, right? > > 1. While toggling two_phase from true to false, we could probably get a > > list of > prepared transactions for this subscriber id and rollback/abort the prepared > transactions. This will allow the transactions to be re-applied like a normal > transaction when the commit comes. Alternatively, if this isn't appropriate > doing it > in the ALTER SUBSCRIPTION context, we could store the xids of all prepared > transactions of this subscription in a list and when the corresponding xid is > being > committed by the apply worker, prior to commit, we make sure the previously > prepared transaction is rolled back. But this would add the overhead of > checking > this list every time a transaction is committed by the apply worker. > > > > In the second solution, if you check at the time of commit whether > there exists a prior prepared transaction then won't we end up > applying the changes twice? I think we can first try to achieve it at > the time of Alter Subscription because the other solution can add > overhead at each commit? Yeah, at least the second solution might be problematic. I prototyped the first one and worked well. However, to make the feature more consistent, it is prohibit to exist prepared transactions on subscriber for now. We can ease based on the requirement. > > 2. No solution yet. > > > > One naive idea is that on the publisher we can remember whether the > prepare has been sent and if so then only send commit_prepared, > otherwise send the entire transaction. On the subscriber-side, we > somehow, need to ensure before applying the first change whether the > corresponding transaction is already prepared and if so then skip the > changes and just perform the commit prepared. One drawback of this > approach is that after restart, the prepare flag wouldn't be saved in > the memory and we end up sending the entire transaction again. One way > to avoid this overhead is that the publisher before sending the entire > transaction checks with subscriber whether it has a prepared > transaction corresponding to the current commit. I understand that > this is not a good idea even if it works but I don't have any better > ideas. What do you think? I considered but not sure it is good to add such mechanism. Your idea requires additional wait-loop, which might lead bugs and unexpected behavior. And it may degrade the performance based on the network environment. As for the another solution (worker sends a list of prepared transactions), it is also not so good because list of prepared transactions may be huge. Based on above, I think we can reject the case for now. FYI - We also considered the idea which walsender waits until all prepared transactions are resolved before decoding and sending changes, but it did not work well - the restarted walsender sent only COMMIT PREPARED record for transactions which have been prepared before disabling the subscription. This happened because 1) if the two_phase option of slots is false, the confirmed_flush can be ahead of PREPARE record, and 2) after the altering and restarting, start_decoding_at becomes same as confirmed_flush and records behind this won't be decoded. > > 3. We could mandate that the altering of two_phase state only be done after > disabling the subscription, just like how it is handled for failover option. > > > > makes sense. OK, this spec was added. According to above, I updated the patch with Ajin. 0001 - extends ALTER SUBSCRIPTION statement. A tab-completion was added. 0002 - mandates the subscription has been disabled. Since no need to change AtEOXact_ApplyLauncher(), the change is reverted. If no objections, this can be included to 0001. 0003 - checks whether there are transactions prepared by the worker. If found, rejects the ALTER SUBSCRIPTION command. 0004 - checks whether there are transactions prepared on publisher. The backend connects to the publisher and confirms it. If found, rejects the ALTER SUBSCRIPTION command. 0005 - adds TAP test for it. Best Regards, Hayato Kuroda FUJITSU LIMITED https://www.fujitsu.com/ v3-0001-Allow-altering-of-two_phase-option-of-a-SUBSCRIPT.patch Description: v3-0001-Allow-altering-of-two_phase-option-of-a-SUBSCRIPT.patch v3-0002-Mandate-the-subscription-has-been-disabled.patch Description: v3-0002-Mandate-the-subscription-has-been-disabled.patch v3-0003-Prohibit-altering-from-true-to-false-if-there-are.patch Description:
RE: Slow catchup of 2PC (twophase) transactions on replica in LR
Dear Amit, > One naive idea is that on the publisher we can remember whether the > prepare has been sent and if so then only send commit_prepared, > otherwise send the entire transaction. On the subscriber-side, we > somehow, need to ensure before applying the first change whether the > corresponding transaction is already prepared and if so then skip the > changes and just perform the commit prepared. One drawback of this > approach is that after restart, the prepare flag wouldn't be saved in > the memory and we end up sending the entire transaction again. One way > to avoid this overhead is that the publisher before sending the entire > transaction checks with subscriber whether it has a prepared > transaction corresponding to the current commit. I understand that > this is not a good idea even if it works but I don't have any better > ideas. What do you think? Alternative idea is that worker pass a list of prepared transaction as new option in START_REPLICATION. This can reduce the number of inter-node communications, but sometimes the list may be huge. Best Regards, Hayato Kuroda FUJITSU LIMITED https://www.fujitsu.com/
Re: Slow catchup of 2PC (twophase) transactions on replica in LR
Hi Amit, Ajin, All Thank you for the patch and the responses. I apologize for my delayed answer due to some curcumstances. On Wednesday, April 10, 2024 14:18 MSK, Amit Kapila wrote: Vitaly, does the minimal solution provided by the proposed patch (Allow to alter two_phase option of a subscriber provided no uncommitted prepared transactions are pending on that subscription.) address your use case?In general, the idea behind the patch seems to be suitable for my case. Furthermore, the case of two_phase switch from false to true with uncommitted pending prepared transactions probably never happens in my case. The switch from false to true means that the replica completes the catchup from the master and switches to the normal mode when it participates in the multi-node configuration. There should be no uncommitted pending prepared transactions at the moment of the switch to the normal mode. I'm going to try this patch. Give me please some time to investigate the patch. I will come with some feedback a little bit later. Thank you for your help! With best regards, Vitaly Davydov
Re: Slow catchup of 2PC (twophase) transactions on replica in LR
On Fri, Apr 5, 2024 at 4:59 PM Ajin Cherian wrote: > > On Thu, Apr 4, 2024 at 4:38 PM Amit Kapila wrote: >> >> >> I think this would probably be better than the current situation but >> can we think of a solution to allow toggling the value of two_phase >> even when prepared transactions are present? Can you please summarize >> the reason for the problems in doing that and the solutions, if any? >> > > > Updated the patch, as it wasn't addressing updating of two-phase in the > remote slot. > Vitaly, does the minimal solution provided by the proposed patch (Allow to alter two_phase option of a subscriber provided no uncommitted prepared transactions are pending on that subscription.) address your use case? > Currently the main issue that needs to be handled is the handling of pending > prepared transactions while the two_phase is altered. I see 3 issues with the > current approach. > > 1. Uncommitted prepared transactions when toggling two_phase from true to > false > When two_phase was true, prepared transactions were decoded at PREPARE time > and send to the subscriber, which is then prepared on the subscriber with a > new gid. Once the two_phase is toggled to false, then the COMMIT PREPARED on > the publisher is converted to commit and the entire transaction is decoded > and sent to the subscriber. This will leave the previously prepared > transaction pending. > > 2. Uncommitted prepared transactions when toggling two_phase form false to > true > When two_phase was false, prepared transactions were ignored and not > decoded at PREPARE time on the publisher. Once the two_phase is toggled to > true, the apply worker and the walsender are restarted and a replication is > restarted from a new "start_decoding_at" LSN. Now, this new > "start_decoding_at" could be past the LSN of the PREPARE record and if so, > the PREPARE record is skipped and not send to the subscriber. Look at > comments in DecodeTXNNeedSkip() for detail. Later when the user issues > COMMIT PREPARED, this is decoded and sent to the subscriber. but there is no > prepared transaction on the subscriber, and this fails because the > corresponding gid of the transaction couldn't be found. > > 3. While altering the two_phase of the subscription, it is required to also > alter the two_phase field of the slot on the primary. The subscription cannot > remotely alter the two_phase option of the slot when the subscription is > enabled, as the slot is owned by the walsender on the publisher side. > Thanks for summarizing the reasons for not allowing altering the two_pc property for a subscription. > Possible solutions for the 3 problems: > > 1. While toggling two_phase from true to false, we could probably get a list > of prepared transactions for this subscriber id and rollback/abort the > prepared transactions. This will allow the transactions to be re-applied like > a normal transaction when the commit comes. Alternatively, if this isn't > appropriate doing it in the ALTER SUBSCRIPTION context, we could store the > xids of all prepared transactions of this subscription in a list and when the > corresponding xid is being committed by the apply worker, prior to commit, we > make sure the previously prepared transaction is rolled back. But this would > add the overhead of checking this list every time a transaction is committed > by the apply worker. > In the second solution, if you check at the time of commit whether there exists a prior prepared transaction then won't we end up applying the changes twice? I think we can first try to achieve it at the time of Alter Subscription because the other solution can add overhead at each commit? > 2. No solution yet. > One naive idea is that on the publisher we can remember whether the prepare has been sent and if so then only send commit_prepared, otherwise send the entire transaction. On the subscriber-side, we somehow, need to ensure before applying the first change whether the corresponding transaction is already prepared and if so then skip the changes and just perform the commit prepared. One drawback of this approach is that after restart, the prepare flag wouldn't be saved in the memory and we end up sending the entire transaction again. One way to avoid this overhead is that the publisher before sending the entire transaction checks with subscriber whether it has a prepared transaction corresponding to the current commit. I understand that this is not a good idea even if it works but I don't have any better ideas. What do you think? > 3. We could mandate that the altering of two_phase state only be done after > disabling the subscription, just like how it is handled for failover option. > makes sense. -- With Regards, Amit Kapila.
Re: Slow catchup of 2PC (twophase) transactions on replica in LR
On Thu, Apr 4, 2024 at 4:38 PM Amit Kapila wrote: > > I think this would probably be better than the current situation but > can we think of a solution to allow toggling the value of two_phase > even when prepared transactions are present? Can you please summarize > the reason for the problems in doing that and the solutions, if any? > > -- > With Regards, > Amit Kapila. > Updated the patch, as it wasn't addressing updating of two-phase in the remote slot. Currently the main issue that needs to be handled is the handling of pending prepared transactions while the two_phase is altered. I see 3 issues with the current approach. 1. Uncommitted prepared transactions when toggling two_phase from true to false When two_phase was true, prepared transactions were decoded at PREPARE time and send to the subscriber, which is then prepared on the subscriber with a new gid. Once the two_phase is toggled to false, then the COMMIT PREPARED on the publisher is converted to commit and the entire transaction is decoded and sent to the subscriber. This will leave the previously prepared transaction pending. 2. Uncommitted prepared transactions when toggling two_phase form false to true When two_phase was false, prepared transactions were ignored and not decoded at PREPARE time on the publisher. Once the two_phase is toggled to true, the apply worker and the walsender are restarted and a replication is restarted from a new "start_decoding_at" LSN. Now, this new "start_decoding_at" could be past the LSN of the PREPARE record and if so, the PREPARE record is skipped and not send to the subscriber. Look at comments in DecodeTXNNeedSkip() for detail. Later when the user issues COMMIT PREPARED, this is decoded and sent to the subscriber. but there is no prepared transaction on the subscriber, and this fails because the corresponding gid of the transaction couldn't be found. 3. While altering the two_phase of the subscription, it is required to also alter the two_phase field of the slot on the primary. The subscription cannot remotely alter the two_phase option of the slot when the subscription is enabled, as the slot is owned by the walsender on the publisher side. Possible solutions for the 3 problems: 1. While toggling two_phase from true to false, we could probably get list of prepared transactions for this subscriber id and rollback/abort the prepared transactions. This will allow the transactions to be re-applied like a normal transaction when the commit comes. Alternatively, if this isn't appropriate doing it in the ALTER SUBSCRIPTION context, we could store the xids of all prepared transactions of this subscription in a list and when the corresponding xid is being committed by the apply worker, prior to commit, we make sure the previously prepared transaction is rolled back. But this would add the overhead of checking this list every time a transaction is committed by the apply worker. 2. No solution yet. 3. We could mandate that the altering of two_phase state only be done after disabling the subscription, just like how it is handled for failover option. Let me know your thoughts. regards, Ajin Cherian Fujitsu Australia v2-0001-Allow-altering-of-two_phase-option-of-a-SUBSCRIPT.patch Description: Binary data
Re: Slow catchup of 2PC (twophase) transactions on replica in LR
On Thu, Apr 4, 2024 at 10:53 AM Ajin Cherian wrote: > > On Wed, Mar 6, 2024 at 1:29 AM Давыдов Виталий > wrote: >> >> In usual work, the subscription has two_phase = on. I have to change this >> option at catchup stage only, but this parameter can not be altered. There >> was a patch proposal in past to implement altering of two_phase option, but >> it was rejected. I think, the recreation of the subscription with two_phase >> = off will not work. >> >> > > The altering of two_phase was restricted because if there was a previously > prepared transaction on the subscriber when the two_phase was on, and then it > was turned off, the apply worker on the subscriber would re-apply the > transaction a second time and this might result in an inconsistent replica. > Here's a patch that allows toggling two_phase option provided that there are > no pending uncommitted prepared transactions on the subscriber for that > subscription. > I think this would probably be better than the current situation but can we think of a solution to allow toggling the value of two_phase even when prepared transactions are present? Can you please summarize the reason for the problems in doing that and the solutions, if any? -- With Regards, Amit Kapila.
Re: Slow catchup of 2PC (twophase) transactions on replica in LR
On Wed, Mar 6, 2024 at 1:29 AM Давыдов Виталий wrote: > In usual work, the subscription has two_phase = on. I have to change this > option at catchup stage only, but this parameter can not be altered. There > was a patch proposal in past to implement altering of two_phase option, but > it was rejected. I think, the recreation of the subscription with two_phase > = off will not work. > > > The altering of two_phase was restricted because if there was a previously prepared transaction on the subscriber when the two_phase was on, and then it was turned off, the apply worker on the subscriber would re-apply the transaction a second time and this might result in an inconsistent replica. Here's a patch that allows toggling two_phase option provided that there are no pending uncommitted prepared transactions on the subscriber for that subscription. Thanks to Kuroda-san for working on the patch. regards, Ajin Cherian Fujitsu Australia v1-0001-Allow-altering-of-two_phase-option-in-subscribers.patch Description: Binary data
Re: Slow catchup of 2PC (twophase) transactions on replica in LR
On Tue, Mar 5, 2024 at 7:59 PM Давыдов Виталий wrote: > > Thank you for the reply. > > On Tuesday, March 05, 2024 12:05 MSK, Heikki Linnakangas > wrote: > > > In a nutshell, this changes PREPARE TRANSACTION so that if > synchronous_commit is 'off', the PREPARE TRANSACTION is not fsync'd to > disk. So if you crash after the PREPARE TRANSACTION has returned, the > transaction might be lost. I think that's completely unacceptable. > > > You are right, the prepared transaction might be lost after crash. The same > may happen with regular transactions that are not fsync-ed on replica in > logical replication by default. The subscription parameter synchronous_commit > is OFF by default. I'm not sure, is there some auto recovery for regular > transactions? > Unless the commit WAL is not flushed, we wouldn't have updated the replication origin's LSN and neither the walsender would increase the confirmed_flush_lsn for the corresponding slot till the commit is flushed on subscriber. So, if the subscriber crashed before flushing the commit record, server should send the same transaction again. The same should be true for prepared transaction stuff as well. -- With Regards, Amit Kapila.
Re: Slow catchup of 2PC (twophase) transactions on replica in LR
Hi Heikki, Thank you for the reply. On Tuesday, March 05, 2024 12:05 MSK, Heikki Linnakangas wrote: In a nutshell, this changes PREPARE TRANSACTION so that if synchronous_commit is 'off', the PREPARE TRANSACTION is not fsync'd to disk. So if you crash after the PREPARE TRANSACTION has returned, the transaction might be lost. I think that's completely unacceptable. You are right, the prepared transaction might be lost after crash. The same may happen with regular transactions that are not fsync-ed on replica in logical replication by default. The subscription parameter synchronous_commit is OFF by default. I'm not sure, is there some auto recovery for regular transactions? I think, the main difference between these two cases - how to manually recover when some PREPARE TRANSACTION or COMMIT PREPARED are lost. For regular transactions, some updates or deletes in tables on replica may be enough to fix the problem. For twophase transactions, it may be harder to fix it by hands, but it is possible, I believe. If you create a custom solution that is based on twophase transactions (like multimaster) such auto recovery may happen automatically. Another solution is to ignore errors on commit prepared if the corresponding prepared tx is missing. I don't know other risks that may happen with async commit of twophase transactions. If you're ok to lose the prepared state of twophase transactions on crash, why don't you create the subscription with 'two_phase=off' to begin with?In usual work, the subscription has two_phase = on. I have to change this option at catchup stage only, but this parameter can not be altered. There was a patch proposal in past to implement altering of two_phase option, but it was rejected. I think, the recreation of the subscription with two_phase = off will not work. I believe, async commit for twophase transactions on catchup will significantly improve the catchup performance. It is worth to think about such feature. P.S. We might introduce a GUC option to allow async commit for twophase transactions. By default, sync commit will be applied for twophase transactions, as it is now. With best regards, Vitaly Davydov
Re: Slow catchup of 2PC (twophase) transactions on replica in LR
On 29/02/2024 19:34, Давыдов Виталий wrote: Dear All, Consider, please, my patch for async commit for twophase transactions. It can be applicable when catchup performance is not enought with publication parameter twophase = on. The key changes are: * Use XLogSetAsyncXactLSN instead of XLogFlush as it is for usual transactions. * In case of async commit only, save 2PC state in the pg_twophase file (but not fsync it) in addition to saving in the WAL. The file is used as an alternative to storing 2pc state in the memory. * On recovery, reject pg_twophase files with future xids. Probably, 2PC async commit should be enabled by a GUC (not implemented in the patch). In a nutshell, this changes PREPARE TRANSACTION so that if synchronous_commit is 'off', the PREPARE TRANSACTION is not fsync'd to disk. So if you crash after the PREPARE TRANSACTION has returned, the transaction might be lost. I think that's completely unacceptable. If you're ok to lose the prepared state of twophase transactions on crash, why don't you create the subscription with 'two_phase=off' to begin with? -- Heikki Linnakangas Neon (https://neon.tech)
Re: Slow catchup of 2PC (twophase) transactions on replica in LR
Dear All, Consider, please, my patch for async commit for twophase transactions. It can be applicable when catchup performance is not enought with publication parameter twophase = on. The key changes are: * Use XLogSetAsyncXactLSN instead of XLogFlush as it is for usual transactions. * In case of async commit only, save 2PC state in the pg_twophase file (but not fsync it) in addition to saving in the WAL. The file is used as an alternative to storing 2pc state in the memory. * On recovery, reject pg_twophase files with future xids.Probably, 2PC async commit should be enabled by a GUC (not implemented in the patch). With best regards, Vitaly From cbaaa7270d771f9ccd6def08f0f02ce61dc15ff6 Mon Sep 17 00:00:00 2001 From: Vitaly Davydov Date: Thu, 29 Feb 2024 18:58:13 +0300 Subject: [PATCH] Async commit support for twophase transactions --- src/backend/access/transam/twophase.c | 171 +- 1 file changed, 138 insertions(+), 33 deletions(-) diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c index 234c8d08eb..352266be14 100644 --- a/src/backend/access/transam/twophase.c +++ b/src/backend/access/transam/twophase.c @@ -109,6 +109,8 @@ #include "utils/memutils.h" #include "utils/timestamp.h" +#define POSTGRESQL_TWOPHASE_SUPPORT_ASYNC_COMMIT + /* * Directory where Two-phase commit files reside within PGDATA */ @@ -169,6 +171,7 @@ typedef struct GlobalTransactionData BackendId locking_backend; /* backend currently working on the xact */ bool valid; /* true if PGPROC entry is in proc array */ bool ondisk; /* true if prepare state file is on disk */ + bool infile; /* true if prepared state saved in file (but not fsync-ed) */ bool inredo; /* true if entry was added via xlog_redo */ char gid[GIDSIZE]; /* The GID assigned to the prepared xact */ } GlobalTransactionData; @@ -227,12 +230,14 @@ static void RemoveGXact(GlobalTransaction gxact); static void XlogReadTwoPhaseData(XLogRecPtr lsn, char **buf, int *len); static char *ProcessTwoPhaseBuffer(TransactionId xid, XLogRecPtr prepare_start_lsn, - bool fromdisk, bool setParent, bool setNextXid); + bool fromdisk, bool setParent, bool setNextXid, + const char *filename); static void MarkAsPreparingGuts(GlobalTransaction gxact, TransactionId xid, const char *gid, TimestampTz prepared_at, Oid owner, Oid databaseid); static void RemoveTwoPhaseFile(TransactionId xid, bool giveWarning); -static void RecreateTwoPhaseFile(TransactionId xid, void *content, int len); +static void RemoveTwoPhaseFileByName(const char *filename, bool giveWarning); +static void RecreateTwoPhaseFile(TransactionId xid, void *content, int len, bool dosync); /* * Initialization of shared memory @@ -427,6 +432,7 @@ MarkAsPreparing(TransactionId xid, const char *gid, MarkAsPreparingGuts(gxact, xid, gid, prepared_at, owner, databaseid); + gxact->infile = false; gxact->ondisk = false; /* And insert it into the active array */ @@ -1204,6 +1210,37 @@ EndPrepare(GlobalTransaction gxact) (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), errmsg("two-phase state file maximum length exceeded"))); +#ifdef POSTGRESQL_TWOPHASE_SUPPORT_ASYNC_COMMIT + + Assert(gxact->infile == false); + + if (synchronous_commit == SYNCHRONOUS_COMMIT_OFF) + { + char *buf; + size_t len = 0; + size_t offset = 0; + + for (record = records.head; record != NULL; record = record->next) + len += record->len; + + if (len > 0) + { + buf = palloc(len); + + for (record = records.head; record != NULL; record = record->next) + { +memcpy(buf + offset, record->data, record->len); +offset += record->len; + } + + RecreateTwoPhaseFile(gxact->xid, buf, len, false); + pfree(buf); + gxact->infile = true; + } + } + +#endif + /* * Now writing 2PC state data to WAL. We let the WAL's CRC protection * cover us, so no need to calculate a separate CRC. @@ -1239,8 +1276,24 @@ EndPrepare(GlobalTransaction gxact) gxact->prepare_end_lsn); } +#if !defined(POSTGRESQL_TWOPHASE_SUPPORT_ASYNC_COMMIT) + XLogFlush(gxact->prepare_end_lsn); +#else + + if (synchronous_commit > SYNCHRONOUS_COMMIT_OFF) + { + /* Flush XLOG to disk */ + XLogFlush(gxact->prepare_end_lsn); + } + else + { + XLogSetAsyncXactLSN(gxact->prepare_end_lsn); + } + +#endif + /* If we crash now, we have prepared: WAL replay will fix things */ /* Store record's start location to read that later on Commit */ @@ -1546,12 +1599,11 @@ FinishPreparedTransaction(const char *gid, bool isCommit) * in WAL files if the LSN is after the last checkpoint record, or moved * to disk if for some reason they have lived for a long time. */ - if (gxact->ondisk) + if (gxact->infile || gxact->ondisk) buf = ReadTwoPhaseFile(xid, false); else XlogReadTwoPhaseData(gxact->prepare_start_lsn, , NULL); - /* * Disassemble the header area */ @@
Re: Slow catchup of 2PC (twophase) transactions on replica in LR
Hi Amit, On Tuesday, February 27, 2024 16:00 MSK, Amit Kapila wrote: As we do XLogFlush() at the time of prepare then why it is not available? OR are you talking about this state after your idea/patch where you are trying to make both Prepare and Commit_prepared records async?Right, I'm talking about my patch where async commit is implemented. There is no such problem with reading 2PC from not flushed WAL in the vanilla because XLogFlush is called unconditionally, as you've described. But an attempt to add some async stuff leads to the problem of reading not flushed WAL. It is why I store 2pc state in the local memory in my patch. It would be good if you could link those threads.Sure, I will find and add some links to the discussions from past. Thank you! With best regards, Vitaly On Tue, Feb 27, 2024 at 4:49 PM Давыдов Виталий wrote: > > Thank you for your interest in the discussion! > > On Monday, February 26, 2024 16:24 MSK, Amit Kapila > wrote: > > > I think the reason is probably that when the WAL record for prepared is > already flushed then what will be the idea of async commit here? > > I think, the idea of async commit should be applied for both transactions: > PREPARE and COMMIT PREPARED, which are actually two separate local > transactions. For both these transactions we may call XLogSetAsyncXactLSN on > commit instead of XLogFlush when async commit is enabled. When I use async > commit, I mean to apply async commit to local transactions, not to a twophase > (prepared) transaction itself. > > > At commit prepared, it seems we read prepare's WAL record, right? If so, it > is not clear to me do you see a problem with a flush of commit_prepared or > reading WAL for prepared or both of these. > > The problem with reading WAL is due to async commit of PREPARE TRANSACTION > which saves 2PC in the WAL. At the moment of COMMIT PREPARED the WAL with > PREPARE TRANSACTION 2PC state may not be XLogFlush-ed yet. > As we do XLogFlush() at the time of prepare then why it is not available? OR are you talking about this state after your idea/patch where you are trying to make both Prepare and Commit_prepared records async? So, PREPARE TRANSACTION should wait until its 2PC state is flushed. > > I did some experiments with saving 2PC state in the local memory of logical > replication worker and, I think, it worked and demonstrated much better > performance. Logical replication worker utilized up to 100% CPU. I'm just > concerned about possible problems with async commit for twophase transactions. > > To be more specific, I've attached a patch to support async commit for > twophase. It is not the final patch but it is presented only for discussion > purposes. There were some attempts to save 2PC in memory in past but it was > rejected. > It would be good if you could link those threads. -- With Regards, Amit Kapila.
Re: Slow catchup of 2PC (twophase) transactions on replica in LR
On Tue, Feb 27, 2024 at 4:49 PM Давыдов Виталий wrote: > > Thank you for your interest in the discussion! > > On Monday, February 26, 2024 16:24 MSK, Amit Kapila > wrote: > > > I think the reason is probably that when the WAL record for prepared is > already flushed then what will be the idea of async commit here? > > I think, the idea of async commit should be applied for both transactions: > PREPARE and COMMIT PREPARED, which are actually two separate local > transactions. For both these transactions we may call XLogSetAsyncXactLSN on > commit instead of XLogFlush when async commit is enabled. When I use async > commit, I mean to apply async commit to local transactions, not to a twophase > (prepared) transaction itself. > > > At commit prepared, it seems we read prepare's WAL record, right? If so, it > is not clear to me do you see a problem with a flush of commit_prepared or > reading WAL for prepared or both of these. > > The problem with reading WAL is due to async commit of PREPARE TRANSACTION > which saves 2PC in the WAL. At the moment of COMMIT PREPARED the WAL with > PREPARE TRANSACTION 2PC state may not be XLogFlush-ed yet. > As we do XLogFlush() at the time of prepare then why it is not available? OR are you talking about this state after your idea/patch where you are trying to make both Prepare and Commit_prepared records async? So, PREPARE TRANSACTION should wait until its 2PC state is flushed. > > I did some experiments with saving 2PC state in the local memory of logical > replication worker and, I think, it worked and demonstrated much better > performance. Logical replication worker utilized up to 100% CPU. I'm just > concerned about possible problems with async commit for twophase transactions. > > To be more specific, I've attached a patch to support async commit for > twophase. It is not the final patch but it is presented only for discussion > purposes. There were some attempts to save 2PC in memory in past but it was > rejected. > It would be good if you could link those threads. -- With Regards, Amit Kapila.
Re: Slow catchup of 2PC (twophase) transactions on replica in LR
Hi Amit, Thank you for your interest in the discussion! On Monday, February 26, 2024 16:24 MSK, Amit Kapila wrote: I think the reason is probably that when the WAL record for prepared is already flushed then what will be the idea of async commit here?I think, the idea of async commit should be applied for both transactions: PREPARE and COMMIT PREPARED, which are actually two separate local transactions. For both these transactions we may call XLogSetAsyncXactLSN on commit instead of XLogFlush when async commit is enabled. When I use async commit, I mean to apply async commit to local transactions, not to a twophase (prepared) transaction itself. At commit prepared, it seems we read prepare's WAL record, right? If so, it is not clear to me do you see a problem with a flush of commit_prepared or reading WAL for prepared or both of these.The problem with reading WAL is due to async commit of PREPARE TRANSACTION which saves 2PC in the WAL. At the moment of COMMIT PREPARED the WAL with PREPARE TRANSACTION 2PC state may not be XLogFlush-ed yet. So, PREPARE TRANSACTION should wait until its 2PC state is flushed. I did some experiments with saving 2PC state in the local memory of logical replication worker and, I think, it worked and demonstrated much better performance. Logical replication worker utilized up to 100% CPU. I'm just concerned about possible problems with async commit for twophase transactions. To be more specific, I've attached a patch to support async commit for twophase. It is not the final patch but it is presented only for discussion purposes. There were some attempts to save 2PC in memory in past but it was rejected. Now, there might be the second round to discuss it. With best regards, Vitaly From 549f809fa122ca0842ec4bfc775afd08feee0d80 Mon Sep 17 00:00:00 2001 From: Vitaly Davydov Date: Tue, 27 Feb 2024 14:02:23 +0300 Subject: [PATCH] Add asynchronous commit support for 2PC --- src/backend/access/transam/twophase.c | 111 +- 1 file changed, 108 insertions(+), 3 deletions(-) diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c index c6af8cfd7e..52f0853db8 100644 --- a/src/backend/access/transam/twophase.c +++ b/src/backend/access/transam/twophase.c @@ -109,6 +109,8 @@ #include "utils/memutils.h" #include "utils/timestamp.h" +#define POSTGRESQL_TWOPHASE_SUPPORT_ASYNC_COMMIT + /* * Directory where Two-phase commit files reside within PGDATA */ @@ -163,6 +165,9 @@ typedef struct GlobalTransactionData */ XLogRecPtr prepare_start_lsn; /* XLOG offset of prepare record start */ XLogRecPtr prepare_end_lsn; /* XLOG offset of prepare record end */ + void* prepare_2pc_mem_data; + size_t prepare_2pc_mem_len; + pid_t prepare_2pc_proc; TransactionId xid; /* The GXACT id */ Oid owner; /* ID of user that executed the xact */ @@ -427,6 +432,9 @@ MarkAsPreparing(TransactionId xid, const char *gid, MarkAsPreparingGuts(gxact, xid, gid, prepared_at, owner, databaseid); + Assert(gxact->prepare_2pc_mem_data == NULL); + Assert(gxact->prepare_2pc_proc == 0); + gxact->ondisk = false; /* And insert it into the active array */ @@ -1129,6 +1137,8 @@ StartPrepare(GlobalTransaction gxact) } } +extern bool IsLogicalWorker(void); + /* * Finish preparing state data and writing it to WAL. */ @@ -1167,6 +1177,37 @@ EndPrepare(GlobalTransaction gxact) (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), errmsg("two-phase state file maximum length exceeded"))); + Assert(gxact->prepare_2pc_mem_data == NULL); + Assert(gxact->prepare_2pc_proc == 0); + + if (IsLogicalWorker()) + { + size_t len = 0; + size_t offset = 0; + + for (record = records.head; record != NULL; record = record->next) + len += record->len; + + if (len > 0) + { + MemoryContext oldmemctx; + + oldmemctx = MemoryContextSwitchTo(TopMemoryContext); + + gxact->prepare_2pc_mem_data = palloc(len); + gxact->prepare_2pc_mem_len = len; + gxact->prepare_2pc_proc = getpid(); + + for (record = records.head; record != NULL; record = record->next) + { +memcpy((char *)gxact->prepare_2pc_mem_data + offset, record->data, record->len); +offset += record->len; + } + + MemoryContextSwitchTo(oldmemctx); + } + } + /* * Now writing 2PC state data to WAL. We let the WAL's CRC protection * cover us, so no need to calculate a separate CRC. @@ -1202,8 +1243,24 @@ EndPrepare(GlobalTransaction gxact) gxact->prepare_end_lsn); } +#if !defined(POSTGRESQL_TWOPHASE_SUPPORT_ASYNC_COMMIT) + XLogFlush(gxact->prepare_end_lsn); +#else + + if (synchronous_commit > SYNCHRONOUS_COMMIT_OFF) + { + /* Flush XLOG to disk */ + XLogFlush(gxact->prepare_end_lsn); + } + else + { + XLogSetAsyncXactLSN(gxact->prepare_end_lsn); + } + +#endif + /* If we crash now, we have prepared: WAL replay will fix things */ /* Store record's start location to read that later on Commit */ @@
Re: Slow catchup of 2PC (twophase) transactions on replica in LR
On Fri, Feb 23, 2024 at 10:41 PM Давыдов Виталий wrote: > > Amit Kapila wrote: > > I don't see we do anything specific for 2PC transactions to make them behave > differently than regular transactions with respect to synchronous_commit > setting. What makes you think so? Can you pin point the code you are > referring to? > > Yes, sure. The function RecordTransactionCommitPrepared is called on prepared > transaction commit (twophase.c). It calls XLogFlush unconditionally. The > function RecordTransactionCommit (for regular transactions, xact.c) calls > XLogFlush if synchronous_commit > OFF, otherwise it calls XLogSetAsyncXactLSN. > > There is some comment in RecordTransactionCommitPrepared (by Bruce Momjian) > that shows that async commit is not supported yet: > /* > * We don't currently try to sleep before flush here ... nor is there any > * support for async commit of a prepared xact (the very idea is probably > * a contradiction) > */ > /* Flush XLOG to disk */ > XLogFlush(recptr); > It seems this comment is added in the commit 4a78cdeb where we added async commit support. I think the reason is probably that when the WAL record for prepared is already flushed then what will be the idea of async commit here? > Right, I think for this we need to implement parallel apply. > > Yes, parallel apply is a good point. But, I believe, it will not work if > asynchronous commit is not supported. You have only one receiver process > which should dispatch incoming messages to parallel workers. I guess, you > will never reach such rate of parallel execution on replica as on the master > with multiple backends. > > > Can you be a bit more specific about what exactly you have in mind to achieve > the above solutions? > > My proposal is to implement async commit for 2PC transactions as it is for > regular transactions. It should significantly speedup the catchup process. > Then, think how to apply in parallel, which is much diffcult to do. The > current problem is to get 2PC state from the WAL on commit prepared. At this > moment, the WAL is not flushed yet, commit function waits until WAL with 2PC > state is to be flushed. I just tried to do it in my sandbox and found such a > problem. Inability to get 2PC state from unflushed WAL stops me right now. I > think about possible solutions. > At commit prepared, it seems we read prepare's WAL record, right? If so, it is not clear to me do you see a problem with a flush of commit_prepared or reading WAL for prepared or both of these. -- With Regards, Amit Kapila.
Re: Slow catchup of 2PC (twophase) transactions on replica in LR
Hi Amit, Amit Kapila wrote: I don't see we do anything specific for 2PC transactions to make them behave differently than regular transactions with respect to synchronous_commit setting. What makes you think so? Can you pin point the code you are referring to?Yes, sure. The function RecordTransactionCommitPrepared is called on prepared transaction commit (twophase.c). It calls XLogFlush unconditionally. The function RecordTransactionCommit (for regular transactions, xact.c) calls XLogFlush if synchronous_commit > OFF, otherwise it calls XLogSetAsyncXactLSN. There is some comment in RecordTransactionCommitPrepared (by Bruce Momjian) that shows that async commit is not supported yet: /* * We don't currently try to sleep before flush here ... nor is there any * support for async commit of a prepared xact (the very idea is probably * a contradiction) */ /* Flush XLOG to disk */ XLogFlush(recptr); Right, I think for this we need to implement parallel apply.Yes, parallel apply is a good point. But, I believe, it will not work if asynchronous commit is not supported. You have only one receiver process which should dispatch incoming messages to parallel workers. I guess, you will never reach such rate of parallel execution on replica as on the master with multiple backends. Can you be a bit more specific about what exactly you have in mind to achieve the above solutions?My proposal is to implement async commit for 2PC transactions as it is for regular transactions. It should significantly speedup the catchup process. Then, think how to apply in parallel, which is much diffcult to do. The current problem is to get 2PC state from the WAL on commit prepared. At this moment, the WAL is not flushed yet, commit function waits until WAL with 2PC state is to be flushed. I just tried to do it in my sandbox and found such a problem. Inability to get 2PC state from unflushed WAL stops me right now. I think about possible solutions. The idea with enableFsync is not a suitable solution, in general, I think. I just pointed it as an alternate idea. You just do enableFsync = false before prepare or commit prepared and do enableFsync = true after these functions. In this case, 2PC records will not be fsync-ed, but FlushPtr will be increased. Thus, 2PC state can be read from WAL on commit prepared without waiting. To make it work correctly, I guess, we have to do some additional work to keep more wal on the master and filter some duplicate transactions on the replica, if replica restarts during catchup. With best regards, Vitaly Davydov
Re: Slow catchup of 2PC (twophase) transactions on replica in LR
Hi Ajin, Thank you for your feedback. Could you please try to increase the number of clients (-c pgbench option) up to 20 or more? It seems, I forgot to specify it. With best regards, Vitaly Davydov On Fri, Feb 23, 2024 at 12:29 AM Давыдов Виталий wrote: Dear All, I'd like to present and talk about a problem when 2PC transactions are applied quite slowly on a replica during logical replication. There is a master and a replica with established logical replication from the master to the replica with twophase = true. With some load level on the master, the replica starts to lag behind the master, and the lag will be increasing. We have to significantly decrease the load on the master to allow replica to complete the catchup. Such problem may create significant difficulties in the production. The problem appears at least on REL_16_STABLE branch. To reproduce the problem: * Setup logical replication from master to replica with subscription parameter twophase = true. * Create some intermediate load on the master (use pgbench with custom sql with prepare+commit) * Optionally switch off the replica for some time (keep load on master). * Switch on the replica and wait until it reaches the master. The replica will never reach the master with even some low load on the master. If to remove the load, the replica will reach the master for much greater time, than expected. I tried the same for regular transactions, but such problem doesn't appear even with a decent load. I tried this setup and I do see that the logical subscriber does reach the master in a short time. I'm not sure what I'm missing. I stopped the logical subscriber in between while pgbench was running and then started it again and ran the following:postgres=# SELECT sent_lsn, pg_current_wal_lsn() FROM pg_stat_replication; sent_lsn | pg_current_wal_lsn ---+ 0/6793FA0 | 0/6793FA0 <=== caught up (1 row) My pgbench command:pgbench postgres -p 6972 -c 2 -j 3 -f /home/ajin/test.sql -T 200 -P 5 my custom sql file:cat test.sql SELECT md5(random()::text) as mygid \gset BEGIN; DELETE FROM test WHERE v = pg_backend_pid(); INSERT INTO test(v) SELECT pg_backend_pid(); PREPARE TRANSACTION $$:mygid$$; COMMIT PREPARED $$:mygid$$; regards,Ajin CherianFujitsu Australia
Re: Slow catchup of 2PC (twophase) transactions on replica in LR
On Fri, Feb 23, 2024 at 12:29 AM Давыдов Виталий wrote: > Dear All, > > I'd like to present and talk about a problem when 2PC transactions are > applied quite slowly on a replica during logical replication. There is a > master and a replica with established logical replication from the master > to the replica with twophase = true. With some load level on the master, > the replica starts to lag behind the master, and the lag will be > increasing. We have to significantly decrease the load on the master to > allow replica to complete the catchup. Such problem may create significant > difficulties in the production. The problem appears at least on > REL_16_STABLE branch. > > To reproduce the problem: > >- Setup logical replication from master to replica with subscription >parameter twophase = true. >- Create some intermediate load on the master (use pgbench with custom >sql with prepare+commit) >- Optionally switch off the replica for some time (keep load on >master). >- Switch on the replica and wait until it reaches the master. > > The replica will never reach the master with even some low load on the > master. If to remove the load, the replica will reach the master for much > greater time, than expected. I tried the same for regular transactions, but > such problem doesn't appear even with a decent load. > > > I tried this setup and I do see that the logical subscriber does reach the master in a short time. I'm not sure what I'm missing. I stopped the logical subscriber in between while pgbench was running and then started it again and ran the following: postgres=# SELECT sent_lsn, pg_current_wal_lsn() FROM pg_stat_replication; sent_lsn | pg_current_wal_lsn ---+ 0/6793FA0 | 0/6793FA0 <=== caught up (1 row) My pgbench command: pgbench postgres -p 6972 -c 2 -j 3 -f /home/ajin/test.sql -T 200 -P 5 my custom sql file: cat test.sql SELECT md5(random()::text) as mygid \gset BEGIN; DELETE FROM test WHERE v = pg_backend_pid(); INSERT INTO test(v) SELECT pg_backend_pid(); PREPARE TRANSACTION $$:mygid$$; COMMIT PREPARED $$:mygid$$; regards, Ajin Cherian Fujitsu Australia
Re: Slow catchup of 2PC (twophase) transactions on replica in LR
On Thu, Feb 22, 2024 at 6:59 PM Давыдов Виталий wrote: > > I'd like to present and talk about a problem when 2PC transactions are > applied quite slowly on a replica during logical replication. There is a > master and a replica with established logical replication from the master to > the replica with twophase = true. With some load level on the master, the > replica starts to lag behind the master, and the lag will be increasing. We > have to significantly decrease the load on the master to allow replica to > complete the catchup. Such problem may create significant difficulties in the > production. The problem appears at least on REL_16_STABLE branch. > > To reproduce the problem: > > Setup logical replication from master to replica with subscription parameter > twophase = true. > Create some intermediate load on the master (use pgbench with custom sql with > prepare+commit) > Optionally switch off the replica for some time (keep load on master). > Switch on the replica and wait until it reaches the master. > > The replica will never reach the master with even some low load on the > master. If to remove the load, the replica will reach the master for much > greater time, than expected. I tried the same for regular transactions, but > such problem doesn't appear even with a decent load. > > I think, the main proplem of 2PC catchup bad performance - the lack of > asynchronous commit support for 2PC. For regular transactions asynchronous > commit is used on the replica by default (subscrition sycnronous_commit = > off). It allows the replication worker process on the replica to avoid fsync > (XLogFLush) and to utilize 100% CPU (the background wal writer or > checkpointer will do fsync). I agree, 2PC are mostly used in multimaster > configurations with two or more nodes which are performed synchronously, but > when the node in catchup (node is not online in a multimaster cluster), > asynchronous commit have to be used to speedup the catchup. > I don't see we do anything specific for 2PC transactions to make them behave differently than regular transactions with respect to synchronous_commit setting. What makes you think so? Can you pin point the code you are referring to? > There is another thing that affects on the disbalance of the master and > replica performance. When the master executes requestes from multiple > clients, there is a fsync optimization takes place in XLogFlush. It allows to > decrease the number of fsync in case when a number of parallel backends write > to the WAL simultaneously. The replica applies received transactions in one > thread sequentially, such optimization is not applied. > Right, I think for this we need to implement parallel apply. > I see some possible solutions: > > Implement asyncronous commit for 2PC transactions. > Do some hacking with enableFsync when it is possible. > Can you be a bit more specific about what exactly you have in mind to achieve the above solutions? -- With Regards, Amit Kapila.
Slow catchup of 2PC (twophase) transactions on replica in LR
Dear All, I'd like to present and talk about a problem when 2PC transactions are applied quite slowly on a replica during logical replication. There is a master and a replica with established logical replication from the master to the replica with twophase = true. With some load level on the master, the replica starts to lag behind the master, and the lag will be increasing. We have to significantly decrease the load on the master to allow replica to complete the catchup. Such problem may create significant difficulties in the production. The problem appears at least on REL_16_STABLE branch. To reproduce the problem: * Setup logical replication from master to replica with subscription parameter twophase = true. * Create some intermediate load on the master (use pgbench with custom sql with prepare+commit) * Optionally switch off the replica for some time (keep load on master). * Switch on the replica and wait until it reaches the master. The replica will never reach the master with even some low load on the master. If to remove the load, the replica will reach the master for much greater time, than expected. I tried the same for regular transactions, but such problem doesn't appear even with a decent load. I think, the main proplem of 2PC catchup bad performance - the lack of asynchronous commit support for 2PC. For regular transactions asynchronous commit is used on the replica by default (subscrition sycnronous_commit = off). It allows the replication worker process on the replica to avoid fsync (XLogFLush) and to utilize 100% CPU (the background wal writer or checkpointer will do fsync). I agree, 2PC are mostly used in multimaster configurations with two or more nodes which are performed synchronously, but when the node in catchup (node is not online in a multimaster cluster), asynchronous commit have to be used to speedup the catchup. There is another thing that affects on the disbalance of the master and replica performance. When the master executes requestes from multiple clients, there is a fsync optimization takes place in XLogFlush. It allows to decrease the number of fsync in case when a number of parallel backends write to the WAL simultaneously. The replica applies received transactions in one thread sequentially, such optimization is not applied. I see some possible solutions: * Implement asyncronous commit for 2PC transactions. * Do some hacking with enableFsync when it is possible. I think, asynchronous commit support for 2PC transactions should significantly increase replica performance and help to solve this problem. I tried to implement it (like for usual transactions) but I've found another problem: 2PC state is stored in WAL on prepare, on commit we have to read 2PC state from WAL but the read is delayed until WAL is flushed by the background wal writer (read LSN should be less than flush LSN). Storing 2PC state in a shared memory (as it proposed earlier) may help. I used the following query to monitor the catchup progress on the master:SELECT sent_lsn, pg_current_wal_lsn() FROM pg_stat_replication; I used the following script for pgbench to the master:SELECT md5(random()::text) as mygid \gset BEGIN; DELETE FROM test WHERE v = pg_backend_pid(); INSERT INTO test(v) SELECT pg_backend_pid(); PREPARE TRANSACTION $$:mygid$$; COMMIT PREPARED $$:mygid$$; What do you think? With best regards, Vitaly Davydov