Re: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-05-16 Thread Peter Smith
Hi Kuroda-san,

I did not apply these v12* patches, but I have diff'ed all of them
with the previous v11* patches and confirmed that all of my previous
review comments now seem to be addressed.

I don't have any more comments to make at this time. Thanks!

==
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-05-16 Thread Peter Smith
Hi, Here are my remaining review comments for the latest v11* patches.

//
v11-0001
//

No changes. No comments.

//
v11-0002
//

==
doc/src/sgml/ref/alter_subscription.sgml

2.1.
ALTER SUBSCRIPTION ... SET (failover = on|off) and
-   ALTER SUBSCRIPTION ... SET (two_phase = on|off)
+   ALTER SUBSCRIPTION ... SET (two_phase = off)

My other thread patch has already been pushed [1], so now you can
modify this to say "true|false" as previously suggested.


//
v11-0003
//

==
src/backend/commands/subscriptioncmds.c

3.1. AlterSubscription

+ subtwophase = LOGICALREP_TWOPHASE_STATE_DISABLED;
+ }
+ else
+ subtwophase = LOGICALREP_TWOPHASE_STATE_PENDING;
+
+
  /* Change system catalog acoordingly */
  values[Anum_pg_subscription_subtwophasestate - 1] =
- CharGetDatum(opts.twophase ?
- LOGICALREP_TWOPHASE_STATE_PENDING :
- LOGICALREP_TWOPHASE_STATE_DISABLED);
+ CharGetDatum(subtwophase);
  replaces[Anum_pg_subscription_subtwophasestate - 1] = true;

Sorry, I don't think that 'subtwophase' change is an improvement --
IMO the existing ternary code was fine as-is.

I only reported the excessive flag checking in the previous v10-0003
review because of some extra "if (!opts.twophase)" code but that was
caused by what you called "wrong git operations." and is already fixed
now.

//
v11-0004
//

==
src/sgml/catalogs.sgml

4.1.
+ 
+  
+   subforcealter bool
+  
+  
+   If true, then the ALTER
SUBSCRIPTION
+   can disable two_phase option, even if there are
+   uncommitted prepared transactions from when two_phase
+   was enabled
+  
+ 
+

I think this description should be changed to say what it *really*
does. IMO, the stuff about 'two_phase' is more like a side-effect.

SUGGESTION (this is just pseudo-code. You can add the CREATE
SUBSCRIPTION 'force_alter' link appropriately)

If true, then the ALTER SUBSCRIPTION command can
sometimes be forced to proceed instead of giving an error. See
force_alter parameter for details about when this might
be useful.

==
[1] 
https://github.com/postgres/postgres/commit/fa65a022db26bc446fb67ce1d7ac543fa4bb72e4

Kind Regards,
Peter Smith.
Fujitsu Australia




RE: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-05-15 Thread Hayato Kuroda (Fujitsu)
Dear Peter,

> You wrote "Fixed" for that patch v9-0004 suggestion but I don't think
> anything was changed at all. Accidentally missed?

Sorry, I missed to do `git add` after the revision.
The change was also included in new patch [1].

[1]: 
https://www.postgresql.org/message-id/OSBPR01MB25522052F9F3E3AAD3BA2A8CF5ED2%40OSBPR01MB2552.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 



Re: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-05-14 Thread Peter Smith
On Tue, May 14, 2024 at 10:03 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear Peter,
>
...
> > 4.11.
> >
> > +# Alter the two_phase with the force_alter option. Apart from the the last
> > +# ALTER SUBSCRIPTION command, the command will abort the prepared
> > transaction
> > +# and succeed.
> >
> > There is typo "the the" and the wording is a bit strange. Why not just say:
> >
> > SUGGESTION
> > Alter the two_phase true to false with the force_alter option enabled.
> > This command will succeed after aborting the prepared transaction.
>
> Fixed.
>

You wrote "Fixed" for that patch v9-0004 suggestion but I don't think
anything was changed at all. Accidentally missed?

==
Kind Regards,
Peter Smith.
Futjisu Australia




Re: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-05-14 Thread Peter Smith
Hi Kuroda-san. Here are my review comments for latest v10* patches.

//
patch v10-0001
//

No changes. No comments.

//
patch v10-0002
//

==
Commit message

2.1.
Regarding the false->true case, the backend process alters the subtwophase to
LOGICALREP_TWOPHASE_STATE_PENDING once. After the subscription is enabled, a new
logical replication worker requests to change the two_phase option of its slot
from pending to true after the initial data synchronization is done. The code
path is the same as the case in which two_phase is initially set to true, so
there is no need to do something remarkable. However, for the true->false case,
the backend must connect to the publisher and expressly change the parameter
because the apply worker does not alter the option to false. The
operation cannot
be rolled back, and altering the parameter from "true" to "false" within a
transaction is prohibited.

~

BEFORE
The operation cannot be rolled back, and altering the parameter from
"true" to "false" within a transaction is prohibited.

SUGGESTION
Because this operation cannot be rolled back, altering the two_phase
parameter from "true" to "false" within a transaction is prohibited.

==
doc/src/sgml/ref/alter_subscription.sgml

2.2.
ALTER SUBSCRIPTION ... SET (failover = on|off) and
-   ALTER SUBSCRIPTION ... SET (two_phase = on|off)
+   ALTER SUBSCRIPTION ... SET (two_phase = off)

I wasn't sure why you chose to keep on|off here instead of true|false,
since in subsequence patch 0003 you changed it true/false everywhere
as discussed in previous reviews.

OTOH if you only did this to be consistent with the "failover=on|off"
then that is OK; but in that case I might raise a separate hackers
thread to propose those should also be changed to true|false for
consistency with the parameer listed on the CREATE SUBSCRIPTION page.
What do you think?

==
src/backend/commands/subscriptioncmds.c

2.3.
  /*
- * The changed two_phase option of the slot can't be rolled
- * back.
+ * Altering the parameter from "true" to "false" within a
+ * transaction is prohibited. Since the apply worker does
+ * not alter the slot option to false, the backend must
+ * connect to the publisher and expressly change the
+ * parameter.
+ *
+ * There is no need to do something remarkable regarding
+ * the "false" to "true" case; the backend process alters
+ * subtwophase to LOGICALREP_TWOPHASE_STATE_PENDING once.
+ * After the subscription is enabled, a new logical
+ * replication worker requests to change the two_phase
+ * option of its slot when the initial data synchronization
+ * is done. The code path is the same as the case in which
+ * two_phase is initially set to true.
  */

BEFORE
...worker requests to change the two_phase option of its slot when...

SUGGESTION
...worker requests to change the two_phase option of its slot from
pending to true when...

==
src/test/subscription/t/099_twophase_added.pl

2.4.
+#
+# Check the case that prepared transactions exist on the publisher node.
+#
+# Since the two_phase is "off", then normally, this PREPARE will do nothing
+# until the COMMIT PREPARED, but in this test, we toggle the two_phase to
+# "true" again before the COMMIT PREPARED happens.

/Since the two_phase is "off"/Since the two_phase is "false"/

//
patch v10-0003
//

==
src/backend/commands/subscriptioncmds.c

3.1. AlterSubscription

+ * If two_phase was enabled, there is a possibility that
+ * transactions have already been PREPARE'd. They must be
+ * checked and rolled back.
  */
  if (!opts.twophase)

I think it will less ambiguous if you modify this to say "If two_phase
was previously enabled"

~~~

3.2.
if (!opts.twophase)
{
List *prepared_xacts;

/*
* Altering the parameter from "true" to "false" within
* a transaction is prohibited. Since the apply worker
* does not alter the slot option to false, the backend
* must connect to the publisher and expressly change
* the parameter.
*
* There is no need to do something remarkable
* regarding the "false" to "true" case; the backend
* process alters subtwophase to
* LOGICALREP_TWOPHASE_STATE_PENDING once. After the
* subscription is enabled, a new logical replication
* worker requests to change the two_phase option of
* its slot when the initial data synchronization is
* done. The code path is the same as the case in which
* two_phase is initially set to true.
*/
if (!opts.twophase)
PreventInTransactionBlock(isTopLevel,
"ALTER SUBSCRIPTION ... SET (two_phase = false)");

/*
* To prevent prepared transactions from being
* isolated, they must manually be aborted.
*/
if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED &&
(prepared_xacts = GetGidListBySubid(subid)) != NIL)
{
/* Abort all listed transactions */
foreach_ptr(char, gid, prepared_xacts)
FinishPreparedTransaction(gid, false);

list_free_deep(prepared_xacts);
}
}

/* Change system catalog acoordingly */

RE: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-05-14 Thread Hayato Kuroda (Fujitsu)
Dear Peter,

Thanks for reviewing! New patch is available in [1].

> I'm having second thoughts about how these patches mention the option
> values "on|off". These are used in the ALTER SUBSCRIPTION document
> page for 'two_phase' and 'failover' parameters, and then those
> "on|off" get propagated to the code comments, error messages, and
> tests...
> 
> Now I see that on the CREATE SUBSCRIPTION page [1], every boolean
> parameter (even including 'two_phase' and 'failover') is described in
> terms of "true|false" (not "on|off").

Hmm. But I could sentences like "The default value is off,...". Also, in 
alter_subscription.sgml,
"on|off" notation has already been used. Not sure, but I felt there are no 
rules around here.

> In hindsight, it is probably better to refer only to true|false
> everywhere for these boolean parameters, instead of sometimes using
> different values like on|off.
> 
> What do you think?

It's OK for me to make message/code comments consistent. Not sure the 
documentation,
but followed only my part.

[1]: 
https://www.postgresql.org/message-id/OSBPR01MB2552F66463EFCFD654E87C09F5E32%40OSBPR01MB2552.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 



Re: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-05-13 Thread Peter Smith
Hi Kuroda-san, Here are some review comments for all patches v9*

//
Patch v9-0001
//

There were no changes since v8-0001, so no comments.

//
Patch v9-0002
//

==
Commit Message

2.1.
Regarding the off->on case, the logical replication already has a
mechanism for it, so there is no need to do anything special for the
on->off case; however, we must connect to the publisher and expressly
change the parameter. The operation cannot be rolled back, and
altering the parameter from "on" to "off" within a transaction is
prohibited.

In the opposite case, there is no need to prevent this because the
logical replication worker already had the mechanism to alter the slot
option at a convenient time.

~

This explanation seems to be going around in circles, without giving
any new information:

AFAICT, "Regarding the off->on case, the logical replication already
has a mechanism for it, so there is no need to do anything special for
the on->off case;"

is saying pretty much the same as:

"In the opposite case, there is no need to prevent this because the
logical replication worker already had the mechanism to alter the slot
option at a convenient time."

But, what I hoped for in previous review comments was an explanation
somewhat less vague than "already has a mechanism" or "already had the
mechanism". Can't this have just 1 or 2 lines to say WHAT is that
existing mechanism for the "off" to "on" case, and WHY that means
there is nothing special to do in that scenario?

==
src/backend/commands/subscriptioncmds.c

2.2. AlterSubscription

  /*
- * The changed two_phase option of the slot can't be rolled
- * back.
+ * Since altering the two_phase option of subscriptions
+ * also leads to changing the slot option, this command
+ * cannot be rolled back. So prevent this if we are in a
+ * transaction block. In the opposite case, there is no
+ * need to prevent this because the logical replication
+ * worker already had the mechanism to alter the slot
+ * option at a convenient time.
  */

(Same previous review comments, and same as my review comment for the
commit message above).

I don't think "already had the mechanism" is enough explanation.

Also, the 2nd sentence doesn't make sense here because the comment
only said "altering the slot option" -- it didn't say it was altering
it to "on" or altering it to "off", so "the opposite case" has no
meaning.

~~~

2.3. AlterSubscription

  /*
- * Try to acquire the connection necessary for altering slot.
+ * Check the need to alter the replication slot. Failover and two_phase
+ * options are controlled by both the publisher (as a slot option) and the
+ * subscriber (as a subscription option). The slot option must be altered
+ * only when changing "on" to "off". Because in opposite case, the logical
+ * replication worker already has the mechanism to do so at a convenient
+ * time.
+ */
+ update_failover = replaces[Anum_pg_subscription_subfailover - 1];
+ update_two_phase = (replaces[Anum_pg_subscription_subtwophasestate - 1] &&
+ !opts.twophase);

This is again the same as other review comments above. Probably, when
some better explanation can be found for "already has the mechanism to
do so at a convenient time." then all of these places can be changed
using similar text.

//
Patch v9-0003
//

There are some imperfect code comments but AFAIK they are the same
ones from patch 0002. I think patch 0003 is just moving those comments
to different places, so probably they would already be addressed by
patch 0002.

//
Patch v9-0004
//

==
doc/src/sgml/catalogs.sgml

4.1.
+ 
+  
+   subforcealter bool
+  
+  
+   If true, the subscription can be altered two_phase
+   option, even if there are prepared transactions
+  
+ 
+

BEFORE
If true, the subscription can be altered two_phase
option, even if there are prepared transactions

SUGGESTION
If true, then the ALTER SUBSCRIPTION command can disable
two_phase option, even if there are uncommitted
prepared transactions from when two_phase was
enabled

==
doc/src/sgml/ref/alter_subscription.sgml

4.2.
-
- 
-  The two_phase parameter can only be altered when the
-  subscription is disabled. When altering the parameter from
on
-  to off, the backend process checks for any incomplete
-  prepared transactions done by the logical replication worker (from when
-  two_phase parameter was still on)
-  and, if any are found, those are aborted.
- 

Well, I still think there ought to be some mention of the relationship
between 'force_alter' and 'two_phase' given on this ALTER SUBSCRIPTION
page. Then the user can cross-reference to read what the 'force_alter'
actually does.

==
doc/src/sgml/ref/create_subscription.sgml

4.3.
+
+   
+force_alter (boolean)
+
+ 
+  Specifies whether the subscription can be altered
+  two_phase option, even if there are prepared

Re: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-05-13 Thread Peter Smith
Hi Kuroda-san,

I'm having second thoughts about how these patches mention the option
values "on|off". These are used in the ALTER SUBSCRIPTION document
page for 'two_phase' and 'failover' parameters, and then those
"on|off" get propagated to the code comments, error messages, and
tests...

Now I see that on the CREATE SUBSCRIPTION page [1], every boolean
parameter (even including 'two_phase' and 'failover') is described in
terms of "true|false" (not "on|off").

In hindsight, it is probably better to refer only to true|false
everywhere for these boolean parameters, instead of sometimes using
different values like on|off.

What do you think?

==
[1] https://www.postgresql.org/docs/devel/sql-createsubscription.html

Kind Regards,
Peter Smith.
Fujitsu Australia




RE: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-05-13 Thread Hayato Kuroda (Fujitsu)
Dear Peter,

Thanks for giving comments! New patch was posted in [1].

> 0.1 General - Patch name
> 
> /SUBSCIRPTION/SUBSCRIPTION/

Fixed.

> ==
> 0.2 General - Apply
> 
> FYI, there are whitespace warnings:
> 
> git
> apply ../patches_misc/v8-0004-Add-force_alter-option-for-ALTER-SUBSCIRPTI
> ON-.-S.patch
> ../patches_misc/v8-0004-Add-force_alter-option-for-ALTER-SUBSCIRPTION-.-
> S.patch:191:
> trailing whitespace.
> # command will abort the prepared transaction and succeed.
> warning: 1 line adds whitespace errors.

I didn't recognize, fixed.

> ==
> 0.3 General - Regression test fails
> 
> The subscription regression tests are not working.
> 
> ok 158   + publication  1187 ms
> not ok 159   + subscription  123 ms
> 
> See review comments #4 and #5 below for the reason why.

Yeah, I missed to update the expected result. Fixed.

> ==
> src/sgml/ref/alter_subscription.sgml
> 
> 1.
>   
>The two_phase parameter can only be altered when
> the
> -  subscription is disabled. When altering the parameter from
> on
> -  to off, the backend process checks prepared
> -  transactions done by the logical replication worker and aborts them.
> +  subscription is disabled. Altering the parameter from
> on
> +  to off will be failed when there are prepared
> +  transactions done by the logical replication worker. If you want to 
> alter
> +  the parameter forcibly in this case, force_alter
> +  option must be set to on at the same time. If
> +  specified, the backend process aborts prepared transactions.
>   
> 1a.
> That "will be failed when..." seems strange. Maybe say "will give an
> error when..."
> 
> ~
> 1b.
> Because "force" is a verb, I think true/false is more natural than
> on/off for this new boolean option. e.g. it acts more like a "flag"
> than a "mode". See all the other boolean options in CREATE
> SUBSCRIPTION -- those are mostly all verbs too and are all true/false
> AFAIK.

Fixed, but note that the part was moved.

> 
> ==
> 
> 2. CREATE SUBSCRIPTION
> 
> For my previous review, two comments [v7-0004#2] and [v7-0004#3] were
> not addressed. Kuroda-san wrote:
> Hmm, but the parameter cannot be used for CREATE SUBSCRIPTION. Should
> we modify to accept and add the description in the doc?
> 
> ~
> 
> Yes, that is what I am suggesting. IMO it is odd for the user to be
> able to ALTER a parameter that cannot be included in the CREATE
> SUBSCRIPTION in the first place. AFAIK there are no other parameters
> that behave that way.

Hmm. I felt that this change required the new attribute in pg_subscription 
system
catalog. Previously I did not like because it contains huge change, but...I 
tried to do.
New attribute 'subforcealter', and some parts were updated accordingly.

> src/backend/commands/subscriptioncmds.c
> 
> 3. AlterSubscription
> 
> + if (!opts.force_alter)
> + ereport(ERROR,
> + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("cannot alter %s when there are prepared transactions",
> + "two_phase = off"),
> + errhint("Resolve these transactions or set %s at the same time, and
> then try again.",
> + "force_alter = true")));
> 
> I think saying "at the same time" in the hint is unnecessary. Surely
> the user is allowed to set this parameter separately if they want to?
> 
> e.g.
> ALTER SUBSCRIPTION sub SET (force_alter=true);
> ALTER SUBSCRIPTION sub SET (two_phase=off);

Actually, it was correct. Since force_alter was not recorded in the system 
catalog, it must
be specified at the same time.
Now, we allow to be separated, so removed.

> ==
> src/test/regress/expected/subscription.out
> 
> 4.
> +-- fail - force_alter cannot be set alone
> +ALTER SUBSCRIPTION regress_testsub SET (force_alter = true);
> +ERROR:  force_alter must be specified with two_phase
> 
> This error cannot happen. You removed that error!

Fixed.

> ==
> src/test/subscription/t/099_twophase_added.pl
> 
> 6.
> +# Try altering the two_phase option to "off." The command will fail since 
> there
> +# is a prepared transaction and the force option is not specified.
> +my $stdout;
> +my $stderr;
> +
> +($result, $stdout, $stderr) = $node_subscriber->psql(
> + 'postgres', "ALTER SUBSCRIPTION regress_sub SET (two_phase = off);");
> +ok($stderr =~ /cannot alter two_phase = off when there are prepared
> transactions/,
> + 'ALTER SUBSCRIPTION failed');
> 
> /force option is not specified./'force_alter' option is not specified as 
> true./

Fixed.

> 
> 7.
> +# Verify the prepared transaction still exists
> +$result = $node_subscriber->safe_psql('postgres',
> +"SELECT count(*) FROM pg_prepared_xacts;");
> +is($result, q(1), "prepared transaction still exits");
> +
> 
> TYPO: /exits/exists/

Fixed.

> 
> ~~~
> 
> 8.
> +# Alter the two_phase with the force_alter option. Apart from the above, the
> +# command will abort the prepared transaction and succeed.
> 

RE: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-05-13 Thread Hayato Kuroda (Fujitsu)
Dear Peter,

Thanks for reviewing! Patch can be available in [1].

> ==
> src/sgml/ref/alter_subscription.sgml
> 
> 1.
> + 
> +  The two_phase parameter can only be altered when
> the
> +  subscription is disabled. When altering the parameter from
> on
> +  to off, the backend process checks prepared
> +  transactions done by the logical replication worker and aborts them.
> + 
> 
> The text may be OK as-is, but I was wondering if it might be better to
> give a more verbose explanation.
> 
> BEFORE
> ... the backend process checks prepared transactions done by the
> logical replication worker and aborts them.
> 
> SUGGESTION
> ... the backend process checks for any incomplete prepared
> transactions done by the logical replication worker (from when
> two_phase parameter was still "on") and, if any are found, those are
> aborted.
>

Fixed.

> ==
> src/backend/commands/subscriptioncmds.c
> 
> 2. AlterSubscription
> 
> - /*
> - * Since the altering two_phase option of subscriptions
> - * also leads to the change of slot option, this command
> - * cannot be rolled back. So prevent we are in the
> - * transaction block.
> + * If two_phase was enabled, there is a possibility the
> + * transactions has already been PREPARE'd. They must be
> + * checked and rolled back.
>   */
> 
> BEFORE
> ... there is a possibility the transactions has already been PREPARE'd.
> 
> SUGGESTION
> ... there is a possibility that transactions have already been PREPARE'd.

Fixed.

> 3. AlterSubscription
> + /*
> + * Since the altering two_phase option of subscriptions
> + * (especially on->off case) also leads to the
> + * change of slot option, this command cannot be rolled
> + * back. So prevent we are in the transaction block.
> + */
>   PreventInTransactionBlock(isTopLevel,
> "ALTER SUBSCRIPTION ... SET (two_phase = off)");
> 
> 
> This comment is a bit vague and includes some typos, but IIUC these
> problems will already be addressed by the 0002 patch changes.AFAIK
> patch 0003 is only moving the 0002 comment.

Yeah, the comment was updated accordingly.

> 
> ==
> src/test/subscription/t/099_twophase_added.pl
> 
> 4.
> +# Check the case that prepared transactions exist on the subscriber node
> +#
> +# If the two_phase is altering from "on" to "off" and there are prepared
> +# transactions on the subscriber, they must be aborted. This test checks it.
> +
> 
> Similar to the comment that I gave for v8-0002. I think there should
> be  comment for the major test comment to
> distinguish it from comments for the sub-steps.

Added.

> 5.
> +# Verify the prepared transaction are aborted because two_phase is changed to
> +# "off".
> +$result = $node_subscriber->safe_psql('postgres',
> +"SELECT count(*) FROM pg_prepared_xacts;");
> +is($result, q(0), "prepared transaction done by worker is aborted");
> +
> 
> /the prepared transaction are aborted/any prepared transactions are aborted/

Fixed.

[1]: 
https://www.postgresql.org/message-id/OSBPR01MB2552FEA48D265EA278AA9F7AF5E22%40OSBPR01MB2552.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 



Re: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-05-13 Thread Peter Smith
Hi, Here are some review comments for v8-0002.

==
Commit message

1.
Regarding the off->on case, the logical replication already has a mechanism for
it, so there is no need to do anything special for the on->off case; however,
we must connect to the publisher and expressly change the parameter. The
operation cannot be rolled back, and altering the parameter from "on" to "off"
within a transaction is prohibited.

~

I think the difference between "off"-to"on" and "on"-to"off" needs to
be explained in more detail. Specifically "already has a mechanism for
it" seems very vague.

==
src/backend/commands/subscriptioncmds.c

2.
  /*
- * The changed two_phase option of the slot can't be rolled
- * back.
+ * Since the altering two_phase option of subscriptions
+ * also leads to the change of slot option, this command
+ * cannot be rolled back. So prevent we are in the
+ * transaction block.
  */
- PreventInTransactionBlock(isTopLevel, "ALTER SUBSCRIPTION ... SET
(two_phase)");
+ if (!opts.twophase)
+ PreventInTransactionBlock(isTopLevel,
+

2a.
There is a typo: "So prevent we are".

SUGGESTION (minor adjustment and typo fix)
Since altering the two_phase option of subscriptions also leads to
changing the slot option, this command cannot be rolled back. So
prevent this if we are in a transaction block.

~

2b.
But, in my previous review [v7-0002#3] I asked if the comment could
explain why this check is only needed for two_phase "on"-to-"off" but
not for "off"-to-"on". That explanation/reason is still not yet given
in the latest comment.

~~~

3.
  /*
- * Try to acquire the connection necessary for altering slot.
+ * Check the need to alter the replication slot. Failover and two_phase
+ * options are controlled by both the publisher (as a slot option) and the
+ * subscriber (as a subscription option).
+ */
+ update_failover = replaces[Anum_pg_subscription_subfailover - 1];
+ update_two_phase = (replaces[Anum_pg_subscription_subtwophasestate - 1] &&
+ !opts.twophase);


(This is similar to the previous comment). In my previous review
[v7-0002#3a] I asked why update_two_phase is TRUE only if 'two-phase'
is being updated "on"-to-"off", but not when it is being updated
"off"-to-"on". That explanation/reason is still not yet given in the
latest comment.

==
src/backend/replication/libpqwalreceiver/libpqwalreceiver.c

4.
- appendStringInfo(, "ALTER_REPLICATION_SLOT %s ( FAILOVER %s,
TWO_PHASE %s )",
- quote_identifier(slotname),
- failover ? "true" : "false",
- two_phase ? "true" : "false");
+ appendStringInfo(, "ALTER_REPLICATION_SLOT %s ( ",
+ quote_identifier(slotname));
+
+ if (failover)
+ appendStringInfo(, "FAILOVER %s",
+ *failover ? "true" : "false");
+
+ if (failover && two_phase)
+ appendStringInfo(, ", ");
+
+ if (two_phase)
+ appendStringInfo(, "TWO_PHASE %s",
+ *two_phase ? "true" : "false");
+
+ appendStringInfoString(, ");");

It will be better if that last line includes the extra space like I
had suggested in [v7-0002#4a] so the result will have the same spacing
as in the original code. e.g.

+ appendStringInfoString(, " );");

==
src/test/subscription/t/099_twophase_added.pl

5.
+# Check the case that prepared transactions exist on the publisher node.
+#
+# Since the two_phase is "off", then normally, this PREPARE will do nothing
+# until the COMMIT PREPARED, but in this test, we toggle the two_phase to "on"
+# again before the COMMIT PREPARED happens.

This is a major test part so IMO this comment should have
# like it had before, to distinguish it from all
the sub-step comments.

==

My v7-0002 review -
https://www.postgresql.org/message-id/CAHut%2BPtu_w_UCGR-5DbenA%2By7wRiA8QPi_ZP%3DCCJ3SGdTn-%3D%3Dg%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia.




Re: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-05-13 Thread Peter Smith
Hi, Here are some comments for v8-0004

==
0.1 General - Patch name

/SUBSCIRPTION/SUBSCRIPTION/

==
0.2 General - Apply

FYI, there are whitespace warnings:

git apply 
../patches_misc/v8-0004-Add-force_alter-option-for-ALTER-SUBSCIRPTION-.-S.patch
../patches_misc/v8-0004-Add-force_alter-option-for-ALTER-SUBSCIRPTION-.-S.patch:191:
trailing whitespace.
# command will abort the prepared transaction and succeed.
warning: 1 line adds whitespace errors.

==
0.3 General - Regression test fails

The subscription regression tests are not working.

ok 158   + publication  1187 ms
not ok 159   + subscription  123 ms

See review comments #4 and #5 below for the reason why.

==
src/sgml/ref/alter_subscription.sgml

1.
  
   The two_phase parameter can only be altered when the
-  subscription is disabled. When altering the parameter from
on
-  to off, the backend process checks prepared
-  transactions done by the logical replication worker and aborts them.
+  subscription is disabled. Altering the parameter from
on
+  to off will be failed when there are prepared
+  transactions done by the logical replication worker. If you want to alter
+  the parameter forcibly in this case, force_alter
+  option must be set to on at the same time. If
+  specified, the backend process aborts prepared transactions.
  
1a.
That "will be failed when..." seems strange. Maybe say "will give an
error when..."

~
1b.
Because "force" is a verb, I think true/false is more natural than
on/off for this new boolean option. e.g. it acts more like a "flag"
than a "mode". See all the other boolean options in CREATE
SUBSCRIPTION -- those are mostly all verbs too and are all true/false
AFAIK.

==

2. CREATE SUBSCRIPTION

For my previous review, two comments [v7-0004#2] and [v7-0004#3] were
not addressed. Kuroda-san wrote:
Hmm, but the parameter cannot be used for CREATE SUBSCRIPTION. Should
we modify to accept and add the description in the doc?

~

Yes, that is what I am suggesting. IMO it is odd for the user to be
able to ALTER a parameter that cannot be included in the CREATE
SUBSCRIPTION in the first place. AFAIK there are no other parameters
that behave that way.

==
src/backend/commands/subscriptioncmds.c

3. AlterSubscription

+ if (!opts.force_alter)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("cannot alter %s when there are prepared transactions",
+ "two_phase = off"),
+ errhint("Resolve these transactions or set %s at the same time, and
then try again.",
+ "force_alter = true")));

I think saying "at the same time" in the hint is unnecessary. Surely
the user is allowed to set this parameter separately if they want to?

e.g.
ALTER SUBSCRIPTION sub SET (force_alter=true);
ALTER SUBSCRIPTION sub SET (two_phase=off);

==
src/test/regress/expected/subscription.out

4.
+-- fail - force_alter cannot be set alone
+ALTER SUBSCRIPTION regress_testsub SET (force_alter = true);
+ERROR:  force_alter must be specified with two_phase

This error cannot happen. You removed that error!

==
src/test/regress/sql/subscription.sql

5.
+-- fail - force_alter cannot be set alone
+ALTER SUBSCRIPTION regress_testsub SET (force_alter = true);

Why is this being tested? You removed that error condition.

==
src/test/subscription/t/099_twophase_added.pl

6.
+# Try altering the two_phase option to "off." The command will fail since there
+# is a prepared transaction and the force option is not specified.
+my $stdout;
+my $stderr;
+
+($result, $stdout, $stderr) = $node_subscriber->psql(
+ 'postgres', "ALTER SUBSCRIPTION regress_sub SET (two_phase = off);");
+ok($stderr =~ /cannot alter two_phase = off when there are prepared
transactions/,
+ 'ALTER SUBSCRIPTION failed');

/force option is not specified./'force_alter' option is not specified as true./

~~~

7.
+# Verify the prepared transaction still exists
+$result = $node_subscriber->safe_psql('postgres',
+"SELECT count(*) FROM pg_prepared_xacts;");
+is($result, q(1), "prepared transaction still exits");
+

TYPO: /exits/exists/

~~~

8.
+# Alter the two_phase with the force_alter option. Apart from the above, the
+# command will abort the prepared transaction and succeed.
+$node_subscriber->safe_psql('postgres',
+"ALTER SUBSCRIPTION regress_sub SET (two_phase = off, force_alter
= true);");
+$node_subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION
regress_sub ENABLE;");
+

What does "Apart from the above" mean? Be more explicit.

~~~

9.
+# Verify the prepared transaction are aborted
 $result = $node_subscriber->safe_psql('postgres',
 "SELECT count(*) FROM pg_prepared_xacts;");
 is($result, q(0), "prepared transaction done by worker is aborted");

/transaction are aborted/transaction was aborted/

==
Response to my v7-0004 review --

Re: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-05-13 Thread Peter Smith
Hi, Here are some review comments for v8-0003.

==
src/sgml/ref/alter_subscription.sgml

1.
+ 
+  The two_phase parameter can only be altered when the
+  subscription is disabled. When altering the parameter from
on
+  to off, the backend process checks prepared
+  transactions done by the logical replication worker and aborts them.
+ 

The text may be OK as-is, but I was wondering if it might be better to
give a more verbose explanation.

BEFORE
... the backend process checks prepared transactions done by the
logical replication worker and aborts them.

SUGGESTION
... the backend process checks for any incomplete prepared
transactions done by the logical replication worker (from when
two_phase parameter was still "on") and, if any are found, those are
aborted.

==
src/backend/commands/subscriptioncmds.c

2. AlterSubscription

- /*
- * Since the altering two_phase option of subscriptions
- * also leads to the change of slot option, this command
- * cannot be rolled back. So prevent we are in the
- * transaction block.
+ * If two_phase was enabled, there is a possibility the
+ * transactions has already been PREPARE'd. They must be
+ * checked and rolled back.
  */

BEFORE
... there is a possibility the transactions has already been PREPARE'd.

SUGGESTION
... there is a possibility that transactions have already been PREPARE'd.

~~~

3. AlterSubscription
+ /*
+ * Since the altering two_phase option of subscriptions
+ * (especially on->off case) also leads to the
+ * change of slot option, this command cannot be rolled
+ * back. So prevent we are in the transaction block.
+ */
  PreventInTransactionBlock(isTopLevel,
"ALTER SUBSCRIPTION ... SET (two_phase = off)");


This comment is a bit vague and includes some typos, but IIUC these
problems will already be addressed by the 0002 patch changes.AFAIK
patch 0003 is only moving the 0002 comment.

==
src/test/subscription/t/099_twophase_added.pl

4.
+# Check the case that prepared transactions exist on the subscriber node
+#
+# If the two_phase is altering from "on" to "off" and there are prepared
+# transactions on the subscriber, they must be aborted. This test checks it.
+

Similar to the comment that I gave for v8-0002. I think there should
be  comment for the major test comment to
distinguish it from comments for the sub-steps.

~~~

5.
+# Verify the prepared transaction are aborted because two_phase is changed to
+# "off".
+$result = $node_subscriber->safe_psql('postgres',
+"SELECT count(*) FROM pg_prepared_xacts;");
+is($result, q(0), "prepared transaction done by worker is aborted");
+

/the prepared transaction are aborted/any prepared transactions are aborted/

==
Kind Regards,
Peter Smith
Fujitsu Australia




RE: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-05-09 Thread Hayato Kuroda (Fujitsu)
Dear Peter,

> 
> ==
> Commit message
> 
> 1.
> A detailed commit message is needed to describe the purpose and
> details of this patch.

Added.

> ==
> doc/src/sgml/ref/alter_subscription.sgml
> 
> 2. CREATE SUBSCRIPTION
> 
> Shouldn't there be an entry for "force_alter" parameter in the CREATE
> SUBSCRIPTION "parameters" section, instead of just vaguely mentioning
> it in passing when describing the "two_phase" in ALTER SUBSCRIPTION?
>
> 3. ALTER SUBSCRIPTION - alterable parameters
> 
> And shouldn't this new option also be named in the ALTER SUBSCRIPTION
> list: "The parameters that can be altered are..."

Hmm, but the parameter cannot be used for CREATE SUBSCRIPTION. Should we
modify to accept and add the description in the doc? This was not accepted.

> ==
> src/backend/commands/subscriptioncmds.c
> 
> 4.
>   XLogRecPtr lsn;
> + bool twophase_force;
>  } SubOpts;
> 
> IMO this field ought to be called 'force_alter' to be the same as the
> option name. Sure, now it is only relevant for 'two_phase', but that
> might not always be the case in the future.

Modified.

> 5. AlterSubscription
> 
> + /*
> + * Abort prepared transactions if force option is also
> + * specified. Otherwise raise an ERROR.
> + */
> + if (!opts.twophase_force)
> + ereport(ERROR,
> + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("cannot alter %s when there are prepared transactions",
> + "two_phase = false")));
> +
> 
> 5a.
> /if force option is also specified/only if the 'force_alter' option is true/

Modified.

> 
> 5b.
> "two_phase = false" -- IMO that should say "two_phase = off"

Modified.

> 5c.
> IMO this ereport should include a errhint to tell the user they can
> use 'force_alter = true' to avoid getting this error.

Hint was added.

> 6.
> 
> + /* force_alter cannot be used standalone */
> + if (IsSet(opts.specified_opts, SUBOPT_FORCE_ALTER) &&
> + !IsSet(opts.specified_opts, SUBOPT_TWOPHASE_COMMIT))
> + ereport(ERROR,
> + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("%s must be specified with %s",
> + "force_alter", "two_phase")));
> +
> 
> IMO this rule is not necessary so the code should be removed. I think
> using 'force_alter' standalone doesn't do anything at all (certainly,
> it does no harm) so why add more complications (more rules, more code,
> more tests) just for the sake of it?

Removed. So standalone 'force_alter' is now no-op.

> src/test/subscription/t/099_twophase_added.pl
> 
> 7.
> +$node_subscriber->safe_psql('postgres',
> +"ALTER SUBSCRIPTION sub SET (two_phase = off, force_alter = on);");
> 
> "force" is a verb, so it is better to say 'force_alter = true' instead
> of 'force_alter = on'.

Fixed. Actually not sure it is better because I'm not a native.

> 8.
>  $result = $node_subscriber->safe_psql('postgres',
>  "SELECT count(*) FROM pg_prepared_xacts;");
>  is($result, q(0), "prepared transaction done by worker is aborted");
> 
> +$node_subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub
> ENABLE;");
> +
> 
> I felt the ENABLE statement should be above the SELECT statement so
> that the code is more like it was before applying the patch.

Fixed.

Please see attached patch set.


Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 



v8-0001-Allow-altering-of-two_phase-option-of-a-SUBSCRIPT.patch
Description:  v8-0001-Allow-altering-of-two_phase-option-of-a-SUBSCRIPT.patch


v8-0002-Alter-slot-option-two_phase-only-when-altering-on.patch
Description:  v8-0002-Alter-slot-option-two_phase-only-when-altering-on.patch


v8-0003-Abort-prepared-transactions-while-altering-two_ph.patch
Description:  v8-0003-Abort-prepared-transactions-while-altering-two_ph.patch


v8-0004-Add-force_alter-option-for-ALTER-SUBSCIRPTION-.-S.patch
Description:  v8-0004-Add-force_alter-option-for-ALTER-SUBSCIRPTION-.-S.patch


RE: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-05-09 Thread Hayato Kuroda (Fujitsu)
Dear Peter,

> Commit Message
> 
> 1.
> The patch needs a commit message to describe the purpose and highlight
> any limitations and other details.

Added.

> ==
> doc/src/sgml/ref/alter_subscription.sgml
> 
> 2.
> +
> + 
> +  The two_phase parameter can only be altered when
> the
> +  subscription is disabled. When altering the parameter from
> true
> +  to false, the backend process checks prepared
> +  transactions done by the logical replication worker and aborts them.
> + 
> 
> Here, the para is referring to "true" and "false" but earlier on this
> page it talks about "twophase = off". IMO it is better to use a
> consistent terminology like "on|off" everywhere instead of randomly
> changing the way it is described each time.

I checked contents and changed to "on|off".

> ==
> src/backend/commands/subscriptioncmds.c
> 
> 3. AlterSubscription
> 
>   if (IsSet(opts.specified_opts, SUBOPT_TWOPHASE_COMMIT))
>   {
> + List *prepared_xacts = NIL;
> 
> This 'prepared_xacts' can be declared at a lower scrope because it is
> only used if (!opts.twophase).
> 
> Furthermore, IIUC you don't need to assign NIL in the declaration
> because there is no chance for it to be unassigned anyway.

Made the namespace narrower and initialization was removed.

> ~~~
> 
> 4. AlterSubscription
> 
> + /*
> + * The changed two_phase option (true->false) of the
> + * slot can't be rolled back.
> + */
>   PreventInTransactionBlock(isTopLevel,
> "ALTER SUBSCRIPTION ... SET (two_phase = off)");
> 
> Here is another example of inconsistent mixing of the terminology
> where the comment says "true"/"false" but the message says "off".
> Let's keep everything consistent. (I prefer on|off).

Modified.

> ~~~
> 
> 5.
> + if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED &&
> + (prepared_xacts = GetGidListBySubid(subid)) != NIL)
> + {
> + ListCell *cell;
> +
> + /* Abort all listed transactions */
> + foreach(cell, prepared_xacts)
> + FinishPreparedTransaction((char *) lfirst(cell),
> +   false);
> +
> + list_free(prepared_xacts);
> + }
> 
> 5A.
> IIRC there is a cleaner way to write this loop without needing
> ListCell variable -- e.g. foreach_ptr() macro?

Changed.

> 5B.
> Shouldn't this be using list_free_deep() so the pstrdup gid gets freed too?

Yeah, fixed.

> ==
> src/test/subscription/t/099_twophase_added.pl
> 
> 6.
> +##
> +# Check the case that prepared transactions exist on subscriber node
> +##
> +
> 
> Give some more detailed comments here similar to the review comment of
> patch v7-0002 for the other part of this TAP test.
> 
> ~~~
> 
> 7. TAP test - comments
> 
> Same as for my v7-0002 review comments, I think this test case also
> needs a few more one-line comments to describe the sub-steps. e.g.:
> 
> # prepare a transaction to insert some rows to the table
> 
> # verify the prepared tx is replicated to the subscriber (because
> 'two_phase = on')
> 
> # toggle the two_phase to 'off' *before* the COMMIT PREPARED
> 
> # verify the prepared tx got aborted
> 
> # do the COMMIT PREPARED (note that now two_phase is 'off')
> 
> # verify the inserted rows got replicated ok

They were fixed based on your previous comments.

> 
> 8. TAP test - subscription name
> 
> It's better to rename the SUBSCRIPTION in this TAP test so you can
> avoid getting log warnings like:
> 
> psql::4: WARNING:  subscriptions created by regression test
> cases should have names starting with "regress_"
> psql::4: NOTICE:  created replication slot "sub" on publisher

Modified, but it was included in 0001.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 




RE: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-05-09 Thread Hayato Kuroda (Fujitsu)
Dear Peter,

> ==
> Commit message
> 
> 1.
> IIUC there is quite a lot of subtlety and details about why the slot
> option needs to be changed only when altering "true" to "false", but
> not when altering "false" to "true".
> 
> It also should explain why PreventInTransactionBlock is only needed
> when altering two_phase "true" to "false".
> 
> Please include a commit message to describe all those tricky details.

Added.

> ==
> src/backend/commands/subscriptioncmds.c
> 
> 2. AlterSubscription
> 
> - PreventInTransactionBlock(isTopLevel, "ALTER SUBSCRIPTION ... SET
> (two_phase)");
> + if (!opts.twophase)
> + PreventInTransactionBlock(isTopLevel,
> +   "ALTER SUBSCRIPTION ... SET (two_phase = off)");
> 
> IMO this needs a comment to explain why PreventInTransactionBlock is
> only needed when changing the 'two_phase' option from on to off.

Added. Thoutht?

> 3. AlterSubscription
> 
> /*
> * Try to acquire the connection necessary for altering slot.
> *
> * This has to be at the end because otherwise if there is an error while
> * doing the database operations we won't be able to rollback altered
> * slot.
> */
> if (replaces[Anum_pg_subscription_subfailover - 1] ||
> replaces[Anum_pg_subscription_subtwophasestate - 1])
> {
> bool must_use_password;
> char*err;
> WalReceiverConn *wrconn;
> bool failover_needs_to_be_updated;
> bool two_phase_needs_to_be_updated;
> 
> /* Load the library providing us libpq calls. */
> load_file("libpqwalreceiver", false);
> 
> /* Try to connect to the publisher. */
> must_use_password = sub->passwordrequired && !sub->ownersuperuser;
> wrconn = walrcv_connect(sub->conninfo, true, true, must_use_password,
> sub->name, );
> if (!wrconn)
> ereport(ERROR,
> (errcode(ERRCODE_CONNECTION_FAILURE),
> errmsg("could not connect to the publisher: %s", err)));
> 
> /*
> * Consider which slot option must be altered.
> *
> * We must alter the failover option whenever subfailover is updated.
> * Two_phase, however, is altered only when changing true to false.
> */
> failover_needs_to_be_updated =
> replaces[Anum_pg_subscription_subfailover - 1];
> two_phase_needs_to_be_updated =
> (replaces[Anum_pg_subscription_subtwophasestate - 1] &&
> !opts.twophase);
> 
> PG_TRY();
> {
> if (two_phase_needs_to_be_updated || failover_needs_to_be_updated)
> walrcv_alter_slot(wrconn, sub->slotname,
>   failover_needs_to_be_updated ?  : NULL,
>   two_phase_needs_to_be_updated ?  : NULL);
> }
> PG_FINALLY();
> {
> walrcv_disconnect(wrconn);
> }
> PG_END_TRY();
> }
> 
> 3a.
> The block comment "Consider which slot option must be altered..." says
> WHEN those options need to be updated, but it doesn't say WHY. e.g.
> why only update the 'two_phase' when it is being disabled but not when
> it is being enabled? In other words, I think there needs to be more
> background/reason details given in this comment.
> 
> ~~~
> 
> 3b.
> Can't those 2 new variable assignments be done up-front and guard this
> entire "if-block" instead of the current replaces[] guarding it? Then
> the code is somewhat simplified.
> 
> SUGGESTION:
> /*
>  * 
>  */
> update_failover = replaces[Anum_pg_subscription_subfailover - 1];
> update_two_phase = (replaces[Anum_pg_subscription_subtwophasestate -
> 1] && !opts.twophase);
> 
> /*
>  * Try to acquire the connection necessary for altering slot.
>  *
>  * This has to be at the end because otherwise if there is an error while
>  * doing the database operations we won't be able to rollback altered
>  * slot.
>  */
> if (update_failover || update_two_phase)
> {
>   ...
> 
>   /* Load the library providing us libpq calls. */
>   load_file("libpqwalreceiver", false);
> 
>   /* Try to connect to the publisher. */
>   must_use_password = sub->passwordrequired && !sub->ownersuperuser;
>   wrconn = walrcv_connect(sub->conninfo, true, true,
> must_use_password, sub->name, );
>   if (!wrconn)
> ereport(ERROR, ...);
> 
>   PG_TRY();
>   {
> walrcv_alter_slot(wrconn, sub->slotname,
>   update_failover ?  : NULL,
>   update_two_phase ?  : NULL);
>   }
>   PG_FINALLY();
>   {
> walrcv_disconnect(wrconn);
>   }
>   PG_END_TRY();
> }

Two variables were added and comments were updated.

> .../libpqwalreceiver/libpqwalreceiver.c
> 
> 4.
> + appendStringInfo(, "ALTER_REPLICATION_SLOT %s ( ",
> + quote_identifier(slotname));
> +
> + if (failover)
> + appendStringInfo(, "FAILOVER %s ",
> + (*failover) ? "true" : "false");
> +
> + if (two_phase)
> + appendStringInfo(, "TWO_PHASE %s%s ",
> + (*two_phase) ? "true" : "false",
> + failover ? ", " : "");
> +
> + appendStringInfoString(, ");");
> 
> 4a.
> IIUC the comma logic here was broken in v7 when you swapped the order.
> Anyway, IMO it will be better NOT to try combining that comma logic
> with the existing appendStringInfo. Doing it separately is both easier
> and less error-prone.
> 
> Furthermore, the parentheses like "(*two_phase)" instead of just
> "*two_phase" seemed a bit overkill.
> 
> SUGGESTION:
> + if (failover)
> + 

RE: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-05-09 Thread Hayato Kuroda (Fujitsu)
Dear Peter,

Thanks for reviewing! The patch will be posted in the upcoming post.

> ==
> src/backend/access/transam/twophase.c
> 
> 1. IsTwoPhaseTransactionGidForSubid
> 
> +/*
> + * IsTwoPhaseTransactionGidForSubid
> + * Check whether the given GID is formed by TwoPhaseTransactionGid.
> + */
> +static bool
> +IsTwoPhaseTransactionGidForSubid(Oid subid, char *gid)
> 
> I think the function comment should mention something about 'subid'.
> 
> SUGGESTION
> Check whether the given GID (as formed by TwoPhaseTransactionGid) is
> for the specified 'subid'.

Fixed.

> src/backend/commands/subscriptioncmds.c
> 
> 2. AlterSubscription
> 
> + if (!opts.twophase &&
> + form->subtwophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED
> &&
> + LookupGXactBySubid(subid))
> + /* Add error message */
> + ereport(ERROR,
> + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("cannot disable two_phase when uncommitted prepared
> transactions present"),
> + errhint("Resolve these transactions and try again")));
> 
> The comment "/* Add error message */" seems unnecessary.

Yeah, this was an internal flag. Removed.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 



Re: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-05-09 Thread Peter Smith
Hi, Here are some review comments for patch v7-0004

==
Commit message

1.
A detailed commit message is needed to describe the purpose and
details of this patch.

==
doc/src/sgml/ref/alter_subscription.sgml

2. CREATE SUBSCRIPTION

Shouldn't there be an entry for "force_alter" parameter in the CREATE
SUBSCRIPTION "parameters" section, instead of just vaguely mentioning
it in passing when describing the "two_phase" in ALTER SUBSCRIPTION?

~

3. ALTER SUBSCRIPTION - alterable parameters

And shouldn't this new option also be named in the ALTER SUBSCRIPTION
list: "The parameters that can be altered are..."

==
src/backend/commands/subscriptioncmds.c

4.
  XLogRecPtr lsn;
+ bool twophase_force;
 } SubOpts;

IMO this field ought to be called 'force_alter' to be the same as the
option name. Sure, now it is only relevant for 'two_phase', but that
might not always be the case in the future.

~~~

5. AlterSubscription

+ /*
+ * Abort prepared transactions if force option is also
+ * specified. Otherwise raise an ERROR.
+ */
+ if (!opts.twophase_force)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("cannot alter %s when there are prepared transactions",
+ "two_phase = false")));
+

5a.
/if force option is also specified/only if the 'force_alter' option is true/

~

5b.
"two_phase = false" -- IMO that should say "two_phase = off"

~

5c.
IMO this ereport should include a errhint to tell the user they can
use 'force_alter = true' to avoid getting this error.

~~~

6.

+ /* force_alter cannot be used standalone */
+ if (IsSet(opts.specified_opts, SUBOPT_FORCE_ALTER) &&
+ !IsSet(opts.specified_opts, SUBOPT_TWOPHASE_COMMIT))
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("%s must be specified with %s",
+ "force_alter", "two_phase")));
+

IMO this rule is not necessary so the code should be removed. I think
using 'force_alter' standalone doesn't do anything at all (certainly,
it does no harm) so why add more complications (more rules, more code,
more tests) just for the sake of it?

==
src/test/subscription/t/099_twophase_added.pl

7.
+$node_subscriber->safe_psql('postgres',
+"ALTER SUBSCRIPTION sub SET (two_phase = off, force_alter = on);");

"force" is a verb, so it is better to say 'force_alter = true' instead
of 'force_alter = on'.

~~~

8.
 $result = $node_subscriber->safe_psql('postgres',
 "SELECT count(*) FROM pg_prepared_xacts;");
 is($result, q(0), "prepared transaction done by worker is aborted");

+$node_subscriber->safe_psql('postgres', "ALTER SUBSCRIPTION sub ENABLE;");
+

I felt the ENABLE statement should be above the SELECT statement so
that the code is more like it was before applying the patch.

==
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-05-09 Thread Peter Smith
Hi, Here are some review comments for the patch v7-0003.

==
Commit Message

1.
The patch needs a commit message to describe the purpose and highlight
any limitations and other details.

==
doc/src/sgml/ref/alter_subscription.sgml

2.
+
+ 
+  The two_phase parameter can only be altered when the
+  subscription is disabled. When altering the parameter from
true
+  to false, the backend process checks prepared
+  transactions done by the logical replication worker and aborts them.
+ 

Here, the para is referring to "true" and "false" but earlier on this
page it talks about "twophase = off". IMO it is better to use a
consistent terminology like "on|off" everywhere instead of randomly
changing the way it is described each time.

==
src/backend/commands/subscriptioncmds.c

3. AlterSubscription

  if (IsSet(opts.specified_opts, SUBOPT_TWOPHASE_COMMIT))
  {
+ List *prepared_xacts = NIL;

This 'prepared_xacts' can be declared at a lower scrope because it is
only used if (!opts.twophase).

Furthermore, IIUC you don't need to assign NIL in the declaration
because there is no chance for it to be unassigned anyway.

~~~

4. AlterSubscription

+ /*
+ * The changed two_phase option (true->false) of the
+ * slot can't be rolled back.
+ */
  PreventInTransactionBlock(isTopLevel,
"ALTER SUBSCRIPTION ... SET (two_phase = off)");

Here is another example of inconsistent mixing of the terminology
where the comment says "true"/"false" but the message says "off".
Let's keep everything consistent. (I prefer on|off).

~~~

5.
+ if (sub->twophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED &&
+ (prepared_xacts = GetGidListBySubid(subid)) != NIL)
+ {
+ ListCell *cell;
+
+ /* Abort all listed transactions */
+ foreach(cell, prepared_xacts)
+ FinishPreparedTransaction((char *) lfirst(cell),
+   false);
+
+ list_free(prepared_xacts);
+ }

5A.
IIRC there is a cleaner way to write this loop without needing
ListCell variable -- e.g. foreach_ptr() macro?

~

5B.
Shouldn't this be using list_free_deep() so the pstrdup gid gets freed too?

==
src/test/subscription/t/099_twophase_added.pl

6.
+##
+# Check the case that prepared transactions exist on subscriber node
+##
+

Give some more detailed comments here similar to the review comment of
patch v7-0002 for the other part of this TAP test.

~~~

7. TAP test - comments

Same as for my v7-0002 review comments, I think this test case also
needs a few more one-line comments to describe the sub-steps. e.g.:

# prepare a transaction to insert some rows to the table

# verify the prepared tx is replicated to the subscriber (because
'two_phase = on')

# toggle the two_phase to 'off' *before* the COMMIT PREPARED

# verify the prepared tx got aborted

# do the COMMIT PREPARED (note that now two_phase is 'off')

# verify the inserted rows got replicated ok

~~~

8. TAP test - subscription name

It's better to rename the SUBSCRIPTION in this TAP test so you can
avoid getting log warnings like:

psql::4: WARNING:  subscriptions created by regression test
cases should have names starting with "regress_"
psql::4: NOTICE:  created replication slot "sub" on publisher

==
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-05-08 Thread Peter Smith
Hi, Here are some review comments for v7-0002

==
Commit message

1.
IIUC there is quite a lot of subtlety and details about why the slot
option needs to be changed only when altering "true" to "false", but
not when altering "false" to "true".

It also should explain why PreventInTransactionBlock is only needed
when altering two_phase "true" to "false".

Please include a commit message to describe all those tricky details.

==
src/backend/commands/subscriptioncmds.c

2. AlterSubscription

- PreventInTransactionBlock(isTopLevel, "ALTER SUBSCRIPTION ... SET
(two_phase)");
+ if (!opts.twophase)
+ PreventInTransactionBlock(isTopLevel,
+   "ALTER SUBSCRIPTION ... SET (two_phase = off)");

IMO this needs a comment to explain why PreventInTransactionBlock is
only needed when changing the 'two_phase' option from on to off.

~~~

3. AlterSubscription

/*
* Try to acquire the connection necessary for altering slot.
*
* This has to be at the end because otherwise if there is an error while
* doing the database operations we won't be able to rollback altered
* slot.
*/
if (replaces[Anum_pg_subscription_subfailover - 1] ||
replaces[Anum_pg_subscription_subtwophasestate - 1])
{
bool must_use_password;
char*err;
WalReceiverConn *wrconn;
bool failover_needs_to_be_updated;
bool two_phase_needs_to_be_updated;

/* Load the library providing us libpq calls. */
load_file("libpqwalreceiver", false);

/* Try to connect to the publisher. */
must_use_password = sub->passwordrequired && !sub->ownersuperuser;
wrconn = walrcv_connect(sub->conninfo, true, true, must_use_password,
sub->name, );
if (!wrconn)
ereport(ERROR,
(errcode(ERRCODE_CONNECTION_FAILURE),
errmsg("could not connect to the publisher: %s", err)));

/*
* Consider which slot option must be altered.
*
* We must alter the failover option whenever subfailover is updated.
* Two_phase, however, is altered only when changing true to false.
*/
failover_needs_to_be_updated =
replaces[Anum_pg_subscription_subfailover - 1];
two_phase_needs_to_be_updated =
(replaces[Anum_pg_subscription_subtwophasestate - 1] &&
!opts.twophase);

PG_TRY();
{
if (two_phase_needs_to_be_updated || failover_needs_to_be_updated)
walrcv_alter_slot(wrconn, sub->slotname,
  failover_needs_to_be_updated ?  : NULL,
  two_phase_needs_to_be_updated ?  : NULL);
}
PG_FINALLY();
{
walrcv_disconnect(wrconn);
}
PG_END_TRY();
}

3a.
The block comment "Consider which slot option must be altered..." says
WHEN those options need to be updated, but it doesn't say WHY. e.g.
why only update the 'two_phase' when it is being disabled but not when
it is being enabled? In other words, I think there needs to be more
background/reason details given in this comment.

~~~

3b.
Can't those 2 new variable assignments be done up-front and guard this
entire "if-block" instead of the current replaces[] guarding it? Then
the code is somewhat simplified.

SUGGESTION:
/*
 * 
 */
update_failover = replaces[Anum_pg_subscription_subfailover - 1];
update_two_phase = (replaces[Anum_pg_subscription_subtwophasestate -
1] && !opts.twophase);

/*
 * Try to acquire the connection necessary for altering slot.
 *
 * This has to be at the end because otherwise if there is an error while
 * doing the database operations we won't be able to rollback altered
 * slot.
 */
if (update_failover || update_two_phase)
{
  ...

  /* Load the library providing us libpq calls. */
  load_file("libpqwalreceiver", false);

  /* Try to connect to the publisher. */
  must_use_password = sub->passwordrequired && !sub->ownersuperuser;
  wrconn = walrcv_connect(sub->conninfo, true, true,
must_use_password, sub->name, );
  if (!wrconn)
ereport(ERROR, ...);

  PG_TRY();
  {
walrcv_alter_slot(wrconn, sub->slotname,
  update_failover ?  : NULL,
  update_two_phase ?  : NULL);
  }
  PG_FINALLY();
  {
walrcv_disconnect(wrconn);
  }
  PG_END_TRY();
}

==
.../libpqwalreceiver/libpqwalreceiver.c

4.
+ appendStringInfo(, "ALTER_REPLICATION_SLOT %s ( ",
+ quote_identifier(slotname));
+
+ if (failover)
+ appendStringInfo(, "FAILOVER %s ",
+ (*failover) ? "true" : "false");
+
+ if (two_phase)
+ appendStringInfo(, "TWO_PHASE %s%s ",
+ (*two_phase) ? "true" : "false",
+ failover ? ", " : "");
+
+ appendStringInfoString(, ");");

4a.
IIUC the comma logic here was broken in v7 when you swapped the order.
Anyway, IMO it will be better NOT to try combining that comma logic
with the existing appendStringInfo. Doing it separately is both easier
and less error-prone.

Furthermore, the parentheses like "(*two_phase)" instead of just
"*two_phase" seemed a bit overkill.

SUGGESTION:
+ if (failover)
+ appendStringInfo(, "FAILOVER %s",
+ *failover ? "true" : "false");
+
+   if (failover && two_phase)
+   appendStringInfo(, ", ");
+
+ if (two_phase)
+ appendStringInfo(, "TWO_PHASE %s",
+ *two_phase ? "true" : "false");
+
+ appendStringInfoString(, " );");

~~

4b.
Like I said above, IMO the current separator logic in v7 is broken. So
it is a bit concerning 

Re: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-05-08 Thread Peter Smith
Hi Kuroda-san,

Thanks for addressing most of my v6-0001 review comments.

Below are some minor follow-up comments for v7-0001.

==
src/backend/access/transam/twophase.c

1. IsTwoPhaseTransactionGidForSubid

+/*
+ * IsTwoPhaseTransactionGidForSubid
+ * Check whether the given GID is formed by TwoPhaseTransactionGid.
+ */
+static bool
+IsTwoPhaseTransactionGidForSubid(Oid subid, char *gid)

I think the function comment should mention something about 'subid'.

SUGGESTION
Check whether the given GID (as formed by TwoPhaseTransactionGid) is
for the specified 'subid'.

==
src/backend/commands/subscriptioncmds.c

2. AlterSubscription

+ if (!opts.twophase &&
+ form->subtwophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED &&
+ LookupGXactBySubid(subid))
+ /* Add error message */
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("cannot disable two_phase when uncommitted prepared
transactions present"),
+ errhint("Resolve these transactions and try again")));

The comment "/* Add error message */" seems unnecessary.

==
Kind Regards,
Peter Smith.
Fujitsu Australia




RE: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-05-08 Thread Hayato Kuroda (Fujitsu)
Dear Peter,

Thanks for reviewing! Here are updated patches.
I updated patches only for HEAD.

> ==
> Commit message
> 
> 1.
> This patch allows user to alter two_phase option
> 
> /allows user/allows the user/
> 
> /to alter two_phase option/to alter the 'two_phase' option/

Fixed.

> ==
> doc/src/sgml/ref/alter_subscription.sgml
> 
> 2.
> two_phase can be altered only for disabled subscription.
> 
> SUGGEST
> The two_phase parameter can only be altered when
> the subscription is disabled.

Fixed.

> ==
> src/backend/access/transam/twophase.c
> 
> 3. checkGid
> +
> +/*
> + * checkGid
> + */
> +static bool
> +checkGid(char *gid, Oid subid)
> +{
> + int ret;
> + Oid subid_written,
> + xid;
> +
> + ret = sscanf(gid, "pg_gid_%u_%u", _written, );
> +
> + if (ret != 2 || subid != subid_written)
> + return false;
> +
> + return true;
> +}
> 
> 3a.
> The function comment should give more explanation of what it does. I
> think this function is the counterpart of the TwoPhaseTransactionGid()
> function of worker.c so the comment can say that too.

Comments were updated.

> 3b.
> Indeed, perhaps the function name should be similar to
> TwoPhaseTransactionGid. e.g. call it like
> IsTwoPhaseTransactionGidForSubid?

Replaced to IsTwoPhaseTransactionGidForSubid().

> 3c.
> Probably 'xid' should be TransactionId instead of Oid.

Right, fixed.

> 3d.
> Why not have a single return?
> 
> SUGGESTION
> return (ret == 2 && subid = subid_written);

Fixed.

> 3e.
> I am wondering if the existing TwoPhaseTransactionGid function
> currently in worker.c should be moved here because IMO these 2
> functions belong together and twophase.c seems the right place to put
> them.

+1, moved.

> ~~~
> 
> 4.
> +LookupGXactBySubid(Oid subid)
> +{
> + bool found = false;
> +
> + LWLockAcquire(TwoPhaseStateLock, LW_SHARED);
> + for (int i = 0; i < TwoPhaseState->numPrepXacts; i++)
> + {
> + GlobalTransaction gxact = TwoPhaseState->prepXacts[i];
> +
> + /* Ignore not-yet-valid GIDs. */
> + if (gxact->valid && checkGid(gxact->gid, subid))
> + {
> + found = true;
> + break;
> + }
> + }
> + LWLockRelease(TwoPhaseStateLock);
> + return found;
> +}
> 
> AFAIK the gxact also has the 'xid' available, so won't it be better to
> pass BOTH the 'xid' and 'subid' to the checkGid so you can do a full
> comparison instead of comparing only the subid part of the gid?

IIUC, the xid written in the gxact means the transaction id on the subscriber,
but formatted GID has xid on the publisher. So the value cannot be used.

> ==
> src/backend/commands/subscriptioncmds.c
> 
> 5. AlterSubscription
> 
> + /* XXX */
> + if (IsSet(opts.specified_opts, SUBOPT_TWOPHASE_COMMIT))
> + {
> 
> The "XXX" comment looks like it is meant to say something more...

This flag was used only for me, removed.

> ~~~
> 
> 6.
> + /*
> + * two_phase can be only changed for disabled
> + * subscriptions
> + */
> + if (form->subenabled)
> + ereport(ERROR,
> + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("cannot set %s for enabled subscription",
> + "two_phase")));
> 
> 6a.
> Should this have a more comprehensive comment giving the reason like
> the 'failover' option has?

Modified, but it is almost the same as failover's one.

> 6b.
> Maybe this should include a "translator" comment to say don't
> translate the option name.

Hmm, but other parts in AlterSubscription() does not have.
For now, I kept current style.

> ~~~
> 
> 7.
> + /* Check whether the number of prepared transactions */
> + if (!opts.twophase &&
> + form->subtwophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED
> &&
> + LookupGXactBySubid(subid))
> + ereport(ERROR,
> + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> + errmsg("cannot disable two_phase when uncommitted prepared
> transactions present")));
> +
> 
> 7a.
> The first comment seems to be an incomplete sentence. I think it
> should say something a bit like:
> two_phase cannot be disabled if there are any uncommitted prepared
> transactions present.

Modified, but this part would be replaced by upcoming patches.

> 7b.
> Also, if ereport occurs what is the user supposed to do about it?
> Shouldn't the ereport include some errhint with appropriate advice?

The hint was added, but this part would be replaced by upcoming patches.

> ~~~
> 
> 8.
> + /*
> + * The changed failover option of the slot can't be rolled
> + * back.
> + */
> + PreventInTransactionBlock(isTopLevel, "ALTER SUBSCRIPTION ... SET
> (two_phase)");
> +
> + /* Change system catalog acoordingly */
> + values[Anum_pg_subscription_subtwophasestate - 1] =
> + CharGetDatum(opts.twophase ?
> + LOGICALREP_TWOPHASE_STATE_PENDING :
> + LOGICALREP_TWOPHASE_STATE_DISABLED);
> + replaces[Anum_pg_subscription_subtwophasestate - 1] = true;
> + }
> 
> Typo I think: /failover option/two_phase option/

Right, fixed.

> ==
> .../libpqwalreceiver/libpqwalreceiver.c
> 
> 9.
>  static void
>  libpqrcv_alter_slot(WalReceiverConn *conn, const char *slotname,
> - bool failover)
> + bool 

Re: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-05-07 Thread Peter Smith
Hi, Here are some review comments for patch v6-0001

==
Commit message

1.
This patch allows user to alter two_phase option

/allows user/allows the user/

/to alter two_phase option/to alter the 'two_phase' option/

==
doc/src/sgml/ref/alter_subscription.sgml

2.
two_phase can be altered only for disabled subscription.

SUGGEST
The two_phase parameter can only be altered when
the subscription is disabled.

==
src/backend/access/transam/twophase.c

3. checkGid
+
+/*
+ * checkGid
+ */
+static bool
+checkGid(char *gid, Oid subid)
+{
+ int ret;
+ Oid subid_written,
+ xid;
+
+ ret = sscanf(gid, "pg_gid_%u_%u", _written, );
+
+ if (ret != 2 || subid != subid_written)
+ return false;
+
+ return true;
+}

3a.
The function comment should give more explanation of what it does. I
think this function is the counterpart of the TwoPhaseTransactionGid()
function of worker.c so the comment can say that too.

~

3b.
Indeed, perhaps the function name should be similar to
TwoPhaseTransactionGid. e.g. call it like
IsTwoPhaseTransactionGidForSubid?

~

3c.
Probably 'xid' should be TransactionId instead of Oid.

~

3d.
Why not have a single return?

SUGGESTION
return (ret == 2 && subid = subid_written);

~

3e.
I am wondering if the existing TwoPhaseTransactionGid function
currently in worker.c should be moved here because IMO these 2
functions belong together and twophase.c seems the right place to put
them.

~~~

4.
+LookupGXactBySubid(Oid subid)
+{
+ bool found = false;
+
+ LWLockAcquire(TwoPhaseStateLock, LW_SHARED);
+ for (int i = 0; i < TwoPhaseState->numPrepXacts; i++)
+ {
+ GlobalTransaction gxact = TwoPhaseState->prepXacts[i];
+
+ /* Ignore not-yet-valid GIDs. */
+ if (gxact->valid && checkGid(gxact->gid, subid))
+ {
+ found = true;
+ break;
+ }
+ }
+ LWLockRelease(TwoPhaseStateLock);
+ return found;
+}

AFAIK the gxact also has the 'xid' available, so won't it be better to
pass BOTH the 'xid' and 'subid' to the checkGid so you can do a full
comparison instead of comparing only the subid part of the gid?

==
src/backend/commands/subscriptioncmds.c

5. AlterSubscription

+ /* XXX */
+ if (IsSet(opts.specified_opts, SUBOPT_TWOPHASE_COMMIT))
+ {

The "XXX" comment looks like it is meant to say something more...

~~~

6.
+ /*
+ * two_phase can be only changed for disabled
+ * subscriptions
+ */
+ if (form->subenabled)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("cannot set %s for enabled subscription",
+ "two_phase")));

6a.
Should this have a more comprehensive comment giving the reason like
the 'failover' option has?

~~~

6b.
Maybe this should include a "translator" comment to say don't
translate the option name.

~~~

7.
+ /* Check whether the number of prepared transactions */
+ if (!opts.twophase &&
+ form->subtwophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED &&
+ LookupGXactBySubid(subid))
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("cannot disable two_phase when uncommitted prepared
transactions present")));
+

7a.
The first comment seems to be an incomplete sentence. I think it
should say something a bit like:
two_phase cannot be disabled if there are any uncommitted prepared
transactions present.

~

7b.
Also, if ereport occurs what is the user supposed to do about it?
Shouldn't the ereport include some errhint with appropriate advice?

~~~

8.
+ /*
+ * The changed failover option of the slot can't be rolled
+ * back.
+ */
+ PreventInTransactionBlock(isTopLevel, "ALTER SUBSCRIPTION ... SET
(two_phase)");
+
+ /* Change system catalog acoordingly */
+ values[Anum_pg_subscription_subtwophasestate - 1] =
+ CharGetDatum(opts.twophase ?
+ LOGICALREP_TWOPHASE_STATE_PENDING :
+ LOGICALREP_TWOPHASE_STATE_DISABLED);
+ replaces[Anum_pg_subscription_subtwophasestate - 1] = true;
+ }

Typo I think: /failover option/two_phase option/

==
.../libpqwalreceiver/libpqwalreceiver.c

9.
 static void
 libpqrcv_alter_slot(WalReceiverConn *conn, const char *slotname,
- bool failover)
+ bool two_phase, bool failover)

Same comment as mentioned elsewhere (#15), IMO the new 'two_phase'
parameter should be last.

==
src/backend/replication/logical/launcher.c

10.
+/*
+ * Stop all the subscription workers.
+ */
+void
+logicalrep_workers_stop(Oid subid)
+{
+ List*subworkers;
+ ListCell   *lc;
+
+ LWLockAcquire(LogicalRepWorkerLock, LW_SHARED);
+ subworkers = logicalrep_workers_find(subid, false);
+ LWLockRelease(LogicalRepWorkerLock);
+ foreach(lc, subworkers)
+ {
+ LogicalRepWorker *w = (LogicalRepWorker *) lfirst(lc);
+
+ logicalrep_worker_stop(w->subid, w->relid);
+ }
+ list_free(subworkers);
+}

I was confused by the logicalrep_workers_find(subid, false). IIUC the
'false' means everything (instead of 'only_running') but then I don't
know why we want to "stop" anything that is NOT running. OTOH I see
that this code was extracted from where it was previously inlined in
subscriptioncmds.c, so maybe the 'false' is necessary for another

RE: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-04-23 Thread Hayato Kuroda (Fujitsu)
Dear hackers,

Per recent commit (b29cbd3da), our patch needed to be rebased.
Here is an updated version.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 
 


v6-0001-Allow-altering-of-two_phase-option-of-a-SUBSCRIPT.patch
Description:  v6-0001-Allow-altering-of-two_phase-option-of-a-SUBSCRIPT.patch


v6-0002-Alter-slot-option-two_phase-only-when-altering-tr.patch
Description:  v6-0002-Alter-slot-option-two_phase-only-when-altering-tr.patch


v6-0003-Abort-prepared-transactions-while-altering-two_ph.patch
Description:  v6-0003-Abort-prepared-transactions-while-altering-two_ph.patch


v6-0004-Add-force_alter-option.patch
Description: v6-0004-Add-force_alter-option.patch


RE: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-04-22 Thread Vitaly Davydov

Dear Hayato,

On Monday, April 22, 2024 15:54 MSK, "Hayato Kuroda (Fujitsu)" 
 wrote:
 > Dear Vitaly,
>
> > I looked at the patch and realized that I can't try it easily in the near 
> > future
> > because the solution I'm working on is based on PG16 or earlier. This patch 
> > is
> > not easily applicable to the older releases. I have to port my solution to 
> > the
> > master, which is not done yet.
>
> We also tried to port our patch for PG16, but the largest barrier was that a
> replication command ALTER_SLOT is not supported. Since the slot option
> two_phase
> can't be modified, it is difficult to skip decoding PREPARE command even when
> altering the option from true to false.
> IIUC, Adding a new feature (e.g., replication command) for minor updates is
> generally
> prohibited
>
...

Attached patch set is a ported version for PG16, which breaks ABI. This can be 
used
for testing purpose, but it won't be pushed to REL_16_STABLE.
At least, this patchset can pass my github CI.

Can you apply and check whether your issue is solved?​​It is fantastic. 
Thank you for your help! I will definitely try your patch. I need some time to 
test and incorporate it. I also plan to port my stuff to the master branch to 
simplify testing of patches.

With best regards,
​Vitaly Davydov

 


RE: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-04-22 Thread Hayato Kuroda (Fujitsu)
> Dear Vitaly,
> 
> > I looked at the patch and realized that I can't try it easily in the near 
> > future
> > because the solution I'm working on is based on PG16 or earlier. This patch 
> > is
> > not easily applicable to the older releases. I have to port my solution to 
> > the
> > master, which is not done yet.
> 
> We also tried to port our patch for PG16, but the largest barrier was that a
> replication command ALTER_SLOT is not supported. Since the slot option
> two_phase
> can't be modified, it is difficult to skip decoding PREPARE command even when
> altering the option from true to false.
> IIUC, Adding a new feature (e.g., replication command) for minor updates is
> generally
> prohibited
> 
> We must consider another approach for backpatching, but we do not have
> solutions
> for now.

Attached patch set is a ported version for PG16, which breaks ABI. This can be 
used
for testing purpose, but it won't be pushed to REL_16_STABLE.
At least, this patchset can pass my github CI.

Can you apply and check whether your issue is solved?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 
 


REL_16_0001-Allow-altering-of-two_phase-option-of-a-SUBSCRIPTION.patch
Description:  REL_16_0001-Allow-altering-of-two_phase-option-of-a-SUBSCRIPTION.patch


REL_16_0002-Alter-slot-option-two_phase-only-when-altering-true-.patch
Description:  REL_16_0002-Alter-slot-option-two_phase-only-when-altering-true-.patch


REL_16_0003-Abort-prepared-transactions-while-altering-two_phase.patch
Description:  REL_16_0003-Abort-prepared-transactions-while-altering-two_phase.patch


REL_16_0004-Add-force_alter-option.patch
Description: REL_16_0004-Add-force_alter-option.patch


RE: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-04-22 Thread Hayato Kuroda (Fujitsu)
Dear hackers,

> Looking at this a bit more, maybe rolling back all prepared transactions on 
> the
> subscriber when toggling two_phase from true to false might not be desirable
> for the customer. Maybe we should have an option for customers to control
> whether transactions should be rolled back or not. Maybe transactions should
> only be rolled back if a "force" option is also set. What do people think?

And here is a patch for adds new option "force_alter" (better name is very 
welcome).
It could be used only when altering two_phase option. Let me share examples.

Assuming that there are logical replication system with two_phase = on, and
there are prepared transactions:

```
subscriber=# SELECT * FROM pg_prepared_xacts ;
 transaction |   gid|   prepared|  owner   | 
database 
-+--+---+--+--
 741 | pg_gid_16390_741 | 2024-04-22 08:02:34.727913+00 | postgres | 
postgres
 742 | pg_gid_16390_742 | 2024-04-22 08:02:34.729486+00 | postgres | 
postgres
(2 rows)
```

At that time, altering two_phase to false alone will be failed:

```
subscriber=# ALTER SUBSCRIPTION sub DISABLE ;
ALTER SUBSCRIPTION
subscriber=# ALTER SUBSCRIPTION sub SET (two_phase = off);
ERROR:  cannot alter two_phase = false when there are prepared transactions
```

It succeeds if force_alter is also expressly set. Prepared transactions will be
aborted at that time.

```
subscriber=# ALTER SUBSCRIPTION sub SET (two_phase = off, force_alter = on);
ALTER SUBSCRIPTION
subscriber=# SELECT * FROM pg_prepared_xacts ;
 transaction | gid | prepared | owner | database 
-+-+--+---+--
(0 rows)
```

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/global/ 



v5-0001-Allow-altering-of-two_phase-option-of-a-SUBSCRIPT.patch
Description:  v5-0001-Allow-altering-of-two_phase-option-of-a-SUBSCRIPT.patch


v5-0002-Alter-slot-option-two_phase-only-when-altering-tr.patch
Description:  v5-0002-Alter-slot-option-two_phase-only-when-altering-tr.patch


v5-0003-Abort-prepared-transactions-while-altering-two_ph.patch
Description:  v5-0003-Abort-prepared-transactions-while-altering-two_ph.patch


v5-0004-Add-force_alter-option.patch
Description: v5-0004-Add-force_alter-option.patch


RE: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-04-22 Thread Hayato Kuroda (Fujitsu)
Dear Vitaly,

> I looked at the patch and realized that I can't try it easily in the near 
> future
> because the solution I'm working on is based on PG16 or earlier. This patch is
> not easily applicable to the older releases. I have to port my solution to the
> master, which is not done yet.

We also tried to port our patch for PG16, but the largest barrier was that a
replication command ALTER_SLOT is not supported. Since the slot option two_phase
can't be modified, it is difficult to skip decoding PREPARE command even when
altering the option from true to false.
IIUC, Adding a new feature (e.g., replication command) for minor updates is 
generally
prohibited

We must consider another approach for backpatching, but we do not have solutions
for now.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/global/ 

 


Re: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-04-22 Thread Ajin Cherian
On Thu, Apr 18, 2024 at 4:26 PM Ajin Cherian  wrote:

>
> Attaching the patch for your review and comments. Big thanks to Kuroda-san
> for also working on the patch.
>
>
Looking at this a bit more, maybe rolling back all prepared transactions on
the subscriber when toggling two_phase from true to false might not be
desirable for the customer. Maybe we should have an option for customers to
control whether transactions should be rolled back or not. Maybe
transactions should only be rolled back if a "force" option is also set.
What do people think?

regards,
Ajin Cherian
Fujitsu Australia


Re: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-04-18 Thread Ajin Cherian
On Tue, Apr 16, 2024 at 4:25 PM Amit Kapila  wrote:

>
> >
>
> Can you please once consider the idea shared by me at [1] (One naive
> idea is that on the publisher .) to solve this problem?
>
> [1] -
> https://www.postgresql.org/message-id/CAA4eK1K1fSkeK%3Dkc26G5cq87vQG4%3D1qs_b%2Bno4%2Bep654SeBy1w%40mail.gmail.com
>
>
>
Expanding on Amit's idea, we found out that there is already a mechanism in
code to fully decode prepared transactions prior to a defined LSN where
two_phase is enabled using the "two_phase_at" LSN in the slot. Look at
ReorderBufferFinishPrepared() on how this is done. This code was not
working as expected in our patch because
we were setting two_phase on the slot to true as soon as the alter command
was received. This was not the correct way, initially when two_phase is
enabled, the two_phase changes to pending state and two_phase option on the
slot should only be set to true when two_phase moves from pending to
enabled. This will happen once the replication is restarted with two_phase
option. Look at code in  CreateDecodingContext() on how "two_phase_at" is
set in the slot when done this way. So we changed the code to not remotely
alter two_phase when toggling from false to true. With this change, now
even if there are pending transactions on the publisher when toggling
two_phase from false to true, these pending transactions will be fully
decoded and sent once the commit prepared is decoded as the pending
prepared transactions are prior to the "two_phase_at" LSN. With this patch,
now we are able to handle both pending prepared transactions when altering
two_phase from true to false as well as false to true.

Attaching the patch for your review and comments. Big thanks to Kuroda-san
for also working on the patch.

regards,
Ajin Cherian
Fujitsu Australia.


v4-0002-Alter-slot-option-two_phase-only-when-altering-tr.patch
Description: Binary data


v4-0003-Abort-prepared-transactions-while-altering-two_ph.patch
Description: Binary data


v4-0001-Allow-altering-of-two_phase-option-of-a-SUBSCRIPT.patch
Description: Binary data


Re: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-04-16 Thread Ajin Cherian
On Tue, Apr 16, 2024 at 1:31 AM Давыдов Виталий 
wrote:

> Dear All,
> Just interested, does anyone tried to reproduce the problem with slow
> catchup of twophase transactions (pgbench should be used with big number of
> clients)? I haven't seen any messages from anyone other that me that the
> problem takes place.
>
>
>

 Yes, I was able to reproduce the slow catchup of twophase transactions
with pgbench with 20 clients.

regards,
Ajin Cherian
Fujitsu Australia


Re: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-04-16 Thread Amit Kapila
On Tue, Apr 16, 2024 at 7:48 AM Hayato Kuroda (Fujitsu)
 wrote:
>
> > > FYI - We also considered the idea which walsender waits until all prepared
> > transactions
> > > are resolved before decoding and sending changes, but it did not work well
> > > - the restarted walsender sent only COMMIT PREPARED record for
> > transactions which
> > > have been prepared before disabling the subscription. This happened 
> > > because
> > > 1) if the two_phase option of slots is false, the confirmed_flush can be 
> > > ahead of
> > >PREPARE record, and
> > > 2) after the altering and restarting, start_decoding_at becomes same as
> > >confirmed_flush and records behind this won't be decoded.
> > >
> >
> > I don't understand the exact problem you are facing. IIUC, if the
> > commit is after start_decoding_at point and prepare was before it, we
> > expect to send the entire transaction followed by a commit record. The
> > restart_lsn should be before the start of such a transaction and we
> > should have recorded the changes in the reorder buffer.
>
> This behavior is right for two_phase = false case. But if the parameter is
> altered between PREPARE and COMMIT PREPARED, there is a possibility that only
> COMMIT PREPARED is sent.
>

Can you please once consider the idea shared by me at [1] (One naive
idea is that on the publisher .) to solve this problem?

[1] - 
https://www.postgresql.org/message-id/CAA4eK1K1fSkeK%3Dkc26G5cq87vQG4%3D1qs_b%2Bno4%2Bep654SeBy1w%40mail.gmail.com

-- 
With Regards,
Amit Kapila.




RE: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-04-15 Thread Hayato Kuroda (Fujitsu)
Dear Amit,

> > FYI - We also considered the idea which walsender waits until all prepared
> transactions
> > are resolved before decoding and sending changes, but it did not work well
> > - the restarted walsender sent only COMMIT PREPARED record for
> transactions which
> > have been prepared before disabling the subscription. This happened because
> > 1) if the two_phase option of slots is false, the confirmed_flush can be 
> > ahead of
> >PREPARE record, and
> > 2) after the altering and restarting, start_decoding_at becomes same as
> >confirmed_flush and records behind this won't be decoded.
> >
> 
> I don't understand the exact problem you are facing. IIUC, if the
> commit is after start_decoding_at point and prepare was before it, we
> expect to send the entire transaction followed by a commit record. The
> restart_lsn should be before the start of such a transaction and we
> should have recorded the changes in the reorder buffer.

This behavior is right for two_phase = false case. But if the parameter is
altered between PREPARE and COMMIT PREPARED, there is a possibility that only
COMMIT PREPARED is sent. As the first place, the executed workload is below.

1. created a subscription with (two_phase = false)
2. prepared a transaction on publisher
3. disabled the subscription once
4. altered the subscription to two_phase = true
5. enabled the subscription again
6. did COMMIT PREPARED on the publisher

-> Apply worker would raise an ERROR while applying COMMIT PREPARED record:
ERROR:  prepared transaction with identifier "pg_gid_XXX_YYY" does not exist

Below part describes why the ERROR occurred.

==

### Regarding 1) the confirmed_flush can be ahead of PREPARE record,

If two_phase is off, as you might know, confirmed_flush can be ahead of PREPARE
record by keepalive mechanism.

Walsender sometimes sends a keepalive message in WalSndKeepalive(). Here the LSN
is written, which is lastly decoded record. Since the PREPARE record is skipped
(just handled by ReorderBufferProcessXid()), sometimes the written LSN in the
message can be ahead of PREPARE record. If the WAL records are aligned like 
below,
the LSN can point CHECKPOINT_ONLINE.

...
INSERT
PREPARE txn1
CHECKPOINT_ONLINE
...

On worker side, when it receives the keepalive, it compares the LSN in the
message and lastly received LSN, and advance last_received. Then, the worker 
replies
to the walsender, and at that time it replies that last_recevied record has been
flushed on the subscriber. See send_feedback().
 
On publisher, when the walsender receives the message from subscriber, it reads
the message and advance the confirmed_flush to the written value. If the 
walsender
sends LSN which locates ahead PREPARE, the confirmed flush is updated as well.

### Regarding 2) after the altering, records behind the confirmed_flush are not 
decoded

Then, at decoding phase. The snapshot builder determines the point where 
decoding
is resumed, as start_decoding_at. After the restart, the value is same as
confirmed_flush of the slot. Since the confiremed_fluish is ahead of PREPARE,
the start_decoding_at becomes ahead as well, so whole of prepared transactions
are not decoded.

==

Attached zip file contains the PoC and used script. You can refer what I really 
did.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 

<>


Re: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-04-15 Thread Давыдов Виталий

Dear All,

On Wednesday, April 10, 2024 17:16 MSK, Давыдов Виталий 
 wrote:
 Hi Amit, Ajin, All

Thank you for the patch and the responses. I apologize for my delayed answer 
due to some curcumstances.
On Wednesday, April 10, 2024 14:18 MSK, Amit Kapila  
wrote:

Vitaly, does the minimal solution provided by the proposed patch (Allow to 
alter two_phase option of a subscriber provided no uncommitted prepared 
transactions are pending on that subscription.) address your use case?In 
general, the idea behind the patch seems to be suitable for my case. 
Furthermore, the case of two_phase switch from false to true with uncommitted 
pending prepared transactions probably never happens in my case. The switch 
from false to true means that the replica completes the catchup from the master 
and switches to the normal mode when it participates in the multi-node 
configuration. There should be no uncommitted pending prepared transactions at 
the moment of the switch to the normal mode.

I'm going to try this patch. Give me please some time to investigate the patch. 
I will come with some feedback a little bit later.
I looked at the patch and realized that I can't try it easily in the near 
future because the solution I'm working on is based on PG16 or earlier. This 
patch is not easily applicable to the older releases. I have to port my 
solution to the master, which is not done yet. I apologize for that - so much 
work should be done before applying the patch. BTW, I tested the idea with 
async 2PC commit on my side and it seems to work fine in my case. Anyway, I 
agree, the idea with altering of subscription seems the best one but much 
harder to implement.

To summarise my case of a synchronous multimaster where twophase is used to 
implement global transactions:
 * Replica may have prepared but not committed transactions when I toggle 
subscription twophase from true to false. In this case, all prepared 
transactions may be aborted on the replica before altering the subscription. * 
Replica will not have prepared transactions when subscription is toggled from 
false to true. In this scenario, the replica completes the catchup (with 
twophase=off) and becomes the part of the multi-nodal cluster and is ready to 
accept new 2PC transactions. All the new pending transactions will wait until 
replica responds. But it may work differently for some other solutions. In 
general, it would be great to allow toggling for all scenarious.Just 
interested, does anyone tried to reproduce the problem with slow catchup of 
twophase transactions (pgbench should be used with big number of clients)? I 
haven't seen any messages from anyone other that me that the problem takes 
place.

Thank you for your help!

With best regards,
Vitaly










 


Re: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-04-15 Thread Amit Kapila
On Mon, Apr 15, 2024 at 1:28 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> > Vitaly, does the minimal solution provided by the proposed patch
> > (Allow to alter two_phase option of a subscriber provided no
> > uncommitted
> > prepared transactions are pending on that subscription.) address your use 
> > case?
>
> I think we do not have to handle cases which there are prepared transactions 
> on
> publisher/subscriber, as the first step. It leads additional complexity and we
> do not have smarter solutions, especially for problem 2.
> IIUC it meets the Vitaly's condition, right?
>
> > > 1. While toggling two_phase from true to false, we could probably get a 
> > > list of
> > prepared transactions for this subscriber id and rollback/abort the prepared
> > transactions. This will allow the transactions to be re-applied like a 
> > normal
> > transaction when the commit comes. Alternatively, if this isn't appropriate 
> > doing it
> > in the ALTER SUBSCRIPTION context, we could store the xids of all prepared
> > transactions of this subscription in a list and when the corresponding xid 
> > is being
> > committed by the apply worker, prior to commit, we make sure the previously
> > prepared transaction is rolled back. But this would add the overhead of 
> > checking
> > this list every time a transaction is committed by the apply worker.
> > >
> >
> > In the second solution, if you check at the time of commit whether
> > there exists a prior prepared transaction then won't we end up
> > applying the changes twice? I think we can first try to achieve it at
> > the time of Alter Subscription because the other solution can add
> > overhead at each commit?
>
> Yeah, at least the second solution might be problematic. I prototyped
> the first one and worked well. However, to make the feature more consistent,
> it is prohibit to exist prepared transactions on subscriber for now.
> We can ease based on the requirement.
>
> > > 2. No solution yet.
> > >
> >
> > One naive idea is that on the publisher we can remember whether the
> > prepare has been sent and if so then only send commit_prepared,
> > otherwise send the entire transaction. On the subscriber-side, we
> > somehow, need to ensure before applying the first change whether the
> > corresponding transaction is already prepared and if so then skip the
> > changes and just perform the commit prepared. One drawback of this
> > approach is that after restart, the prepare flag wouldn't be saved in
> > the memory and we end up sending the entire transaction again. One way
> > to avoid this overhead is that the publisher before sending the entire
> > transaction checks with subscriber whether it has a prepared
> > transaction corresponding to the current commit. I understand that
> > this is not a good idea even if it works but I don't have any better
> > ideas. What do you think?
>
> I considered but not sure it is good to add such mechanism. Your idea requires
> additional wait-loop, which might lead bugs and unexpected behavior. And it 
> may
> degrade the performance based on the network environment.
> As for the another solution (worker sends a list of prepared transactions), it
> is also not so good because list of prepared transactions may be huge.
>
> Based on above, I think we can reject the case for now.
>
> FYI - We also considered the idea which walsender waits until all prepared 
> transactions
> are resolved before decoding and sending changes, but it did not work well
> - the restarted walsender sent only COMMIT PREPARED record for transactions 
> which
> have been prepared before disabling the subscription. This happened because
> 1) if the two_phase option of slots is false, the confirmed_flush can be 
> ahead of
>PREPARE record, and
> 2) after the altering and restarting, start_decoding_at becomes same as
>confirmed_flush and records behind this won't be decoded.
>

I don't understand the exact problem you are facing. IIUC, if the
commit is after start_decoding_at point and prepare was before it, we
expect to send the entire transaction followed by a commit record. The
restart_lsn should be before the start of such a transaction and we
should have recorded the changes in the reorder buffer.

-- 
With Regards,
Amit Kapila.




RE: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-04-15 Thread Hayato Kuroda (Fujitsu)
Dear Amit,

> Vitaly, does the minimal solution provided by the proposed patch
> (Allow to alter two_phase option of a subscriber provided no
> uncommitted
> prepared transactions are pending on that subscription.) address your use 
> case?

I think we do not have to handle cases which there are prepared transactions on
publisher/subscriber, as the first step. It leads additional complexity and we
do not have smarter solutions, especially for problem 2.
IIUC it meets the Vitaly's condition, right?

> > 1. While toggling two_phase from true to false, we could probably get a 
> > list of
> prepared transactions for this subscriber id and rollback/abort the prepared
> transactions. This will allow the transactions to be re-applied like a normal
> transaction when the commit comes. Alternatively, if this isn't appropriate 
> doing it
> in the ALTER SUBSCRIPTION context, we could store the xids of all prepared
> transactions of this subscription in a list and when the corresponding xid is 
> being
> committed by the apply worker, prior to commit, we make sure the previously
> prepared transaction is rolled back. But this would add the overhead of 
> checking
> this list every time a transaction is committed by the apply worker.
> >
> 
> In the second solution, if you check at the time of commit whether
> there exists a prior prepared transaction then won't we end up
> applying the changes twice? I think we can first try to achieve it at
> the time of Alter Subscription because the other solution can add
> overhead at each commit?

Yeah, at least the second solution might be problematic. I prototyped
the first one and worked well. However, to make the feature more consistent,
it is prohibit to exist prepared transactions on subscriber for now.
We can ease based on the requirement.

> > 2. No solution yet.
> >
> 
> One naive idea is that on the publisher we can remember whether the
> prepare has been sent and if so then only send commit_prepared,
> otherwise send the entire transaction. On the subscriber-side, we
> somehow, need to ensure before applying the first change whether the
> corresponding transaction is already prepared and if so then skip the
> changes and just perform the commit prepared. One drawback of this
> approach is that after restart, the prepare flag wouldn't be saved in
> the memory and we end up sending the entire transaction again. One way
> to avoid this overhead is that the publisher before sending the entire
> transaction checks with subscriber whether it has a prepared
> transaction corresponding to the current commit. I understand that
> this is not a good idea even if it works but I don't have any better
> ideas. What do you think?

I considered but not sure it is good to add such mechanism. Your idea requires
additional wait-loop, which might lead bugs and unexpected behavior. And it may
degrade the performance based on the network environment.
As for the another solution (worker sends a list of prepared transactions), it
is also not so good because list of prepared transactions may be huge.

Based on above, I think we can reject the case for now.

FYI - We also considered the idea which walsender waits until all prepared 
transactions
are resolved before decoding and sending changes, but it did not work well
- the restarted walsender sent only COMMIT PREPARED record for transactions 
which
have been prepared before disabling the subscription. This happened because
1) if the two_phase option of slots is false, the confirmed_flush can be ahead 
of
   PREPARE record, and
2) after the altering and restarting, start_decoding_at becomes same as
   confirmed_flush and records behind this won't be decoded.

> > 3. We could mandate that the altering of two_phase state only be done after
> disabling the subscription, just like how it is handled for failover option.
> >
> 
> makes sense.

OK, this spec was added.

According to above, I updated the patch with Ajin.
0001 - extends ALTER SUBSCRIPTION statement. A tab-completion was added.
0002 - mandates the subscription has been disabled. Since no need to change 
   AtEOXact_ApplyLauncher(), the change is reverted.
   If no objections, this can be included to 0001.
0003 - checks whether there are transactions prepared by the worker. If found,
   rejects the ALTER SUBSCRIPTION command.
0004 - checks whether there are transactions prepared on publisher. The backend
   connects to the publisher and confirms it. If found, rejects the ALTER
   SUBSCRIPTION command.
0005 - adds TAP test for it.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 



v3-0001-Allow-altering-of-two_phase-option-of-a-SUBSCRIPT.patch
Description:  v3-0001-Allow-altering-of-two_phase-option-of-a-SUBSCRIPT.patch


v3-0002-Mandate-the-subscription-has-been-disabled.patch
Description: v3-0002-Mandate-the-subscription-has-been-disabled.patch


v3-0003-Prohibit-altering-from-true-to-false-if-there-are.patch
Description:  

RE: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-04-10 Thread Hayato Kuroda (Fujitsu)
Dear Amit,

> One naive idea is that on the publisher we can remember whether the
> prepare has been sent and if so then only send commit_prepared,
> otherwise send the entire transaction. On the subscriber-side, we
> somehow, need to ensure before applying the first change whether the
> corresponding transaction is already prepared and if so then skip the
> changes and just perform the commit prepared. One drawback of this
> approach is that after restart, the prepare flag wouldn't be saved in
> the memory and we end up sending the entire transaction again. One way
> to avoid this overhead is that the publisher before sending the entire
> transaction checks with subscriber whether it has a prepared
> transaction corresponding to the current commit. I understand that
> this is not a good idea even if it works but I don't have any better
> ideas. What do you think?

Alternative idea is that worker pass a list of prepared transaction as new
option in START_REPLICATION. This can reduce the number of inter-node
communications, but sometimes the list may be huge.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 


Re: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-04-10 Thread Давыдов Виталий

Hi Amit, Ajin, All

Thank you for the patch and the responses. I apologize for my delayed answer 
due to some curcumstances.
On Wednesday, April 10, 2024 14:18 MSK, Amit Kapila  
wrote:

Vitaly, does the minimal solution provided by the proposed patch (Allow to 
alter two_phase option of a subscriber provided no uncommitted prepared 
transactions are pending on that subscription.) address your use case?In 
general, the idea behind the patch seems to be suitable for my case. 
Furthermore, the case of two_phase switch from false to true with uncommitted 
pending prepared transactions probably never happens in my case. The switch 
from false to true means that the replica completes the catchup from the master 
and switches to the normal mode when it participates in the multi-node 
configuration. There should be no uncommitted pending prepared transactions at 
the moment of the switch to the normal mode.

I'm going to try this patch. Give me please some time to investigate the patch. 
I will come with some feedback a little bit later.

Thank you for your help!

With best regards,
Vitaly Davydov


 


Re: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-04-10 Thread Amit Kapila
On Fri, Apr 5, 2024 at 4:59 PM Ajin Cherian  wrote:
>
> On Thu, Apr 4, 2024 at 4:38 PM Amit Kapila  wrote:
>>
>>
>> I think this would probably be better than the current situation but
>> can we think of a solution to allow toggling the value of two_phase
>> even when prepared transactions are present? Can you please summarize
>> the reason for the problems in doing that and the solutions, if any?
>>
>
>
> Updated the patch, as it wasn't addressing updating of two-phase in the 
> remote slot.
>

Vitaly, does the minimal solution provided by the proposed patch
(Allow to alter two_phase option of a subscriber provided no
uncommitted
prepared transactions are pending on that subscription.) address your use case?

>  Currently the main issue that needs to be handled is the handling of pending 
> prepared transactions while the two_phase is altered. I see 3 issues with the 
> current approach.
>
> 1. Uncommitted prepared transactions when toggling two_phase from true to 
> false
>   When two_phase was true, prepared transactions were decoded at PREPARE time 
> and send to the subscriber, which is then prepared on the subscriber with a 
> new gid. Once the two_phase is toggled to false, then the COMMIT PREPARED on 
> the publisher is converted to commit and the entire transaction is decoded 
> and sent to the subscriber. This will   leave the previously prepared 
> transaction pending.
>
> 2. Uncommitted prepared transactions when toggling two_phase form false to 
> true
>   When two_phase was false, prepared transactions were ignored and not 
> decoded at PREPARE time on the publisher. Once the two_phase is toggled to 
> true, the apply worker and the walsender are restarted and a replication is 
> restarted from a new "start_decoding_at" LSN. Now, this new 
> "start_decoding_at" could be past the LSN of the PREPARE record and if so, 
> the PREPARE record is skipped and not send to the subscriber. Look at 
> comments in DecodeTXNNeedSkip() for detail.  Later when the user issues 
> COMMIT PREPARED, this is decoded and sent to the subscriber. but there is no 
> prepared transaction on the subscriber, and this fails because the  
> corresponding gid of the transaction couldn't be found.
>
> 3. While altering the two_phase of the subscription, it is required to also 
> alter the two_phase field of the slot on the primary. The subscription cannot 
> remotely alter the two_phase option of the slot when the subscription is  
> enabled, as the slot is owned by the walsender on the publisher side.
>

Thanks for summarizing the reasons for not allowing altering the
two_pc property for a subscription.

> Possible solutions for the 3 problems:
>
> 1. While toggling two_phase from true to false, we could probably get a list 
> of prepared transactions for this subscriber id and rollback/abort the 
> prepared transactions. This will allow the transactions to be re-applied like 
> a normal transaction when the commit comes. Alternatively, if this isn't 
> appropriate doing it in the ALTER SUBSCRIPTION context, we could store the 
> xids of all prepared transactions of this subscription in a list and when the 
> corresponding xid is being committed by the apply worker, prior to commit, we 
> make sure the previously prepared transaction is rolled back. But this would 
> add the overhead of checking this list every time a transaction is committed 
> by the apply worker.
>

In the second solution, if you check at the time of commit whether
there exists a prior prepared transaction then won't we end up
applying the changes twice? I think we can first try to achieve it at
the time of Alter Subscription because the other solution can add
overhead at each commit?

> 2. No solution yet.
>

One naive idea is that on the publisher we can remember whether the
prepare has been sent and if so then only send commit_prepared,
otherwise send the entire transaction. On the subscriber-side, we
somehow, need to ensure before applying the first change whether the
corresponding transaction is already prepared and if so then skip the
changes and just perform the commit prepared. One drawback of this
approach is that after restart, the prepare flag wouldn't be saved in
the memory and we end up sending the entire transaction again. One way
to avoid this overhead is that the publisher before sending the entire
transaction checks with subscriber whether it has a prepared
transaction corresponding to the current commit. I understand that
this is not a good idea even if it works but I don't have any better
ideas. What do you think?

> 3. We could mandate that the altering of two_phase state only be done after 
> disabling the subscription, just like how it is handled for failover option.
>

makes sense.

-- 
With Regards,
Amit Kapila.




Re: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-04-05 Thread Ajin Cherian
On Thu, Apr 4, 2024 at 4:38 PM Amit Kapila  wrote:

>
> I think this would probably be better than the current situation but
> can we think of a solution to allow toggling the value of two_phase
> even when prepared transactions are present? Can you please summarize
> the reason for the problems in doing that and the solutions, if any?
>
> --
> With Regards,
> Amit Kapila.
>

Updated the patch, as it wasn't addressing updating of two-phase in the
remote slot.

 Currently the main issue that needs to be handled is the handling of
pending prepared transactions while the two_phase is altered. I see 3
issues with the current approach.

1. Uncommitted prepared transactions when toggling two_phase from true to
false
When two_phase was true, prepared transactions were decoded at PREPARE time
and send to the subscriber, which is then prepared on the subscriber with a
new gid. Once the two_phase is toggled to false, then the COMMIT PREPARED
on the publisher is converted to commit and the entire transaction is
decoded and sent to the subscriber. This will   leave the previously
prepared transaction pending.

2. Uncommitted prepared transactions when toggling two_phase form false to
true
When two_phase was false, prepared transactions were ignored and not
decoded at PREPARE time on the publisher. Once the two_phase is toggled to
true, the apply worker and the walsender are restarted and a replication is
restarted from a new "start_decoding_at" LSN. Now, this new
"start_decoding_at" could be past the LSN of the PREPARE record and if so,
the PREPARE record is skipped and not send to the subscriber. Look at
comments in DecodeTXNNeedSkip() for detail.  Later when the user issues
COMMIT PREPARED, this is decoded and sent to the subscriber. but there is
no prepared transaction on the subscriber, and this fails because the
corresponding gid of the transaction couldn't be found.

3. While altering the two_phase of the subscription, it is required to also
alter the two_phase field of the slot on the primary. The subscription
cannot remotely alter the two_phase option of the slot when the
subscription is  enabled, as the slot is owned by the walsender on the
publisher side.

Possible solutions for the 3 problems:

1. While toggling two_phase from true to false, we could probably get list
of prepared transactions for this subscriber id and rollback/abort the
prepared transactions. This will allow the transactions to be re-applied
like a normal transaction when the commit comes. Alternatively, if this
isn't appropriate doing it in the ALTER SUBSCRIPTION context, we could
store the xids of all prepared transactions of this subscription in a list
and when the corresponding xid is being committed by the apply worker,
prior to commit, we make sure the previously prepared transaction is rolled
back. But this would add the overhead of checking this list every time a
transaction is committed by the apply worker.

2. No solution yet.

3. We could mandate that the altering of two_phase state only be done after
disabling the subscription, just like how it is handled for failover option.
Let me know your thoughts.

regards,
Ajin Cherian
Fujitsu Australia


v2-0001-Allow-altering-of-two_phase-option-of-a-SUBSCRIPT.patch
Description: Binary data


Re: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-04-03 Thread Amit Kapila
On Thu, Apr 4, 2024 at 10:53 AM Ajin Cherian  wrote:
>
> On Wed, Mar 6, 2024 at 1:29 AM Давыдов Виталий  
> wrote:
>>
>> In usual work, the subscription has two_phase = on. I have to change this 
>> option at catchup stage only, but this parameter can not be altered. There 
>> was a patch proposal in past to implement altering of two_phase option, but 
>> it was rejected. I think, the recreation of the subscription with two_phase 
>> = off will not work.
>>
>>
>
> The altering of two_phase was restricted because if there was a previously 
> prepared transaction on the subscriber when the two_phase was on, and then it 
> was turned off, the apply worker on the subscriber would re-apply the 
> transaction a second time and this might result in an inconsistent replica.
> Here's a patch that allows toggling two_phase option provided that there are 
> no pending uncommitted prepared transactions on the subscriber for that 
> subscription.
>

I think this would probably be better than the current situation but
can we think of a solution to allow toggling the value of two_phase
even when prepared transactions are present? Can you please summarize
the reason for the problems in doing that and the solutions, if any?

-- 
With Regards,
Amit Kapila.




Re: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-04-03 Thread Ajin Cherian
On Wed, Mar 6, 2024 at 1:29 AM Давыдов Виталий 
wrote:

> In usual work, the subscription has two_phase = on. I have to change this
> option at catchup stage only, but this parameter can not be altered. There
> was a patch proposal in past to implement altering of two_phase option, but
> it was rejected. I think, the recreation of the subscription with two_phase
> = off will not work.
>
>
>
The altering of two_phase was restricted because if there was a previously
prepared transaction on the subscriber when the two_phase was on, and then
it was turned off, the apply worker on the subscriber would re-apply the
transaction a second time and this might result in an inconsistent replica.
Here's a patch that allows toggling two_phase option provided that there
are no pending uncommitted prepared transactions on the subscriber for that
subscription.

Thanks to Kuroda-san for working on the patch.

regards,
Ajin Cherian
Fujitsu Australia


v1-0001-Allow-altering-of-two_phase-option-in-subscribers.patch
Description: Binary data


Re: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-03-06 Thread Amit Kapila
On Tue, Mar 5, 2024 at 7:59 PM Давыдов Виталий  wrote:
>
> Thank you for the reply.
>
> On Tuesday, March 05, 2024 12:05 MSK, Heikki Linnakangas  
> wrote:
>
>
> In a nutshell, this changes PREPARE TRANSACTION so that if
> synchronous_commit is 'off', the PREPARE TRANSACTION is not fsync'd to
> disk. So if you crash after the PREPARE TRANSACTION has returned, the
> transaction might be lost. I think that's completely unacceptable.
>
>
> You are right, the prepared transaction might be lost after crash. The same 
> may happen with regular transactions that are not fsync-ed on replica in 
> logical replication by default. The subscription parameter synchronous_commit 
> is OFF by default. I'm not sure, is there some auto recovery for regular 
> transactions?
>

Unless the commit WAL is not flushed, we wouldn't have updated the
replication origin's LSN and neither the walsender would increase the
confirmed_flush_lsn for the corresponding slot till the commit is
flushed on subscriber. So, if the subscriber crashed before flushing
the commit record, server should send the same transaction again. The
same should be true for prepared transaction stuff as well.

-- 
With Regards,
Amit Kapila.




Re: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-03-05 Thread Давыдов Виталий

Hi Heikki,

Thank you for the reply.

On Tuesday, March 05, 2024 12:05 MSK, Heikki Linnakangas  
wrote:
 In a nutshell, this changes PREPARE TRANSACTION so that if
synchronous_commit is 'off', the PREPARE TRANSACTION is not fsync'd to
disk. So if you crash after the PREPARE TRANSACTION has returned, the
transaction might be lost. I think that's completely unacceptable.​
You are right, the prepared transaction might be lost after crash. The same may 
happen with regular transactions that are not fsync-ed on replica in logical 
replication by default. The subscription parameter synchronous_commit is OFF by 
default. I'm not sure, is there some auto recovery for regular transactions? I 
think, the main difference between these two cases - how to manually recover 
when some PREPARE TRANSACTION or COMMIT PREPARED are lost. For regular 
transactions, some updates or deletes in tables on replica may be enough to fix 
the problem. For twophase transactions, it may be harder to fix it by hands, 
but it is possible, I believe. If you create a custom solution that is based on 
twophase transactions (like multimaster) such auto recovery may happen 
automatically. Another solution is to ignore errors on commit prepared if the 
corresponding prepared tx is missing. I don't know other risks that may happen 
with async commit of twophase transactions.
 If you're ok to lose the prepared state of twophase transactions on
crash, why don't you create the subscription with 'two_phase=off' to
begin with?In usual work, the subscription has two_phase = on. I have to change 
this option at catchup stage only, but this parameter can not be altered. There 
was a patch proposal in past to implement altering of two_phase option, but it 
was rejected. I think, the recreation of the subscription with two_phase = off 
will not work.

I believe, async commit for twophase transactions on catchup will significantly 
improve the catchup performance. It is worth to think about such feature.

P.S. We might introduce a GUC option to allow async commit for twophase 
transactions. By default, sync commit will be applied for twophase 
transactions, as it is now.

With best regards,
Vitaly Davydov


Re: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-03-05 Thread Heikki Linnakangas

On 29/02/2024 19:34, Давыдов Виталий wrote:

Dear All,

Consider, please, my patch for async commit for twophase transactions. 
It can be applicable when catchup performance is not enought with 
publication parameter twophase = on.


The key changes are:

  * Use XLogSetAsyncXactLSN instead of XLogFlush as it is for usual
transactions.
  * In case of async commit only, save 2PC state in the pg_twophase file
(but not fsync it) in addition to saving in the WAL. The file is
used as an alternative to storing 2pc state in the memory.
  * On recovery, reject pg_twophase files with future xids.

Probably, 2PC async commit should be enabled by a GUC (not implemented 
in the patch).


In a nutshell, this changes PREPARE TRANSACTION so that if 
synchronous_commit is 'off', the PREPARE TRANSACTION is not fsync'd to 
disk. So if you crash after the PREPARE TRANSACTION has returned, the 
transaction might be lost. I think that's completely unacceptable.


If you're ok to lose the prepared state of twophase transactions on 
crash, why don't you create the subscription with 'two_phase=off' to 
begin with?


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-02-29 Thread Давыдов Виталий

Dear All,

Consider, please, my patch for async commit for twophase transactions. It can 
be applicable when catchup performance is not enought with publication 
parameter twophase = on.

The key changes are:
 * Use XLogSetAsyncXactLSN instead of XLogFlush as it is for usual 
transactions. * In case of async commit only, save 2PC state in the pg_twophase 
file (but not fsync it) in addition to saving in the WAL. The file is used as 
an alternative to storing 2pc state in the memory. * On recovery, reject 
pg_twophase files with future xids.Probably, 2PC async commit should be enabled 
by a GUC (not implemented in the patch).

With best regards,
Vitaly


 
From cbaaa7270d771f9ccd6def08f0f02ce61dc15ff6 Mon Sep 17 00:00:00 2001
From: Vitaly Davydov 
Date: Thu, 29 Feb 2024 18:58:13 +0300
Subject: [PATCH] Async commit support for twophase transactions

---
 src/backend/access/transam/twophase.c | 171 +-
 1 file changed, 138 insertions(+), 33 deletions(-)

diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index 234c8d08eb..352266be14 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -109,6 +109,8 @@
 #include "utils/memutils.h"
 #include "utils/timestamp.h"
 
+#define POSTGRESQL_TWOPHASE_SUPPORT_ASYNC_COMMIT
+
 /*
  * Directory where Two-phase commit files reside within PGDATA
  */
@@ -169,6 +171,7 @@ typedef struct GlobalTransactionData
 	BackendId	locking_backend;	/* backend currently working on the xact */
 	bool		valid;			/* true if PGPROC entry is in proc array */
 	bool		ondisk;			/* true if prepare state file is on disk */
+	bool		infile;			/* true if prepared state saved in file (but not fsync-ed) */
 	bool		inredo;			/* true if entry was added via xlog_redo */
 	char		gid[GIDSIZE];	/* The GID assigned to the prepared xact */
 }			GlobalTransactionData;
@@ -227,12 +230,14 @@ static void RemoveGXact(GlobalTransaction gxact);
 static void XlogReadTwoPhaseData(XLogRecPtr lsn, char **buf, int *len);
 static char *ProcessTwoPhaseBuffer(TransactionId xid,
    XLogRecPtr prepare_start_lsn,
-   bool fromdisk, bool setParent, bool setNextXid);
+   bool fromdisk, bool setParent, bool setNextXid,
+   const char *filename);
 static void MarkAsPreparingGuts(GlobalTransaction gxact, TransactionId xid,
 const char *gid, TimestampTz prepared_at, Oid owner,
 Oid databaseid);
 static void RemoveTwoPhaseFile(TransactionId xid, bool giveWarning);
-static void RecreateTwoPhaseFile(TransactionId xid, void *content, int len);
+static void RemoveTwoPhaseFileByName(const char *filename, bool giveWarning);
+static void RecreateTwoPhaseFile(TransactionId xid, void *content, int len, bool dosync);
 
 /*
  * Initialization of shared memory
@@ -427,6 +432,7 @@ MarkAsPreparing(TransactionId xid, const char *gid,
 
 	MarkAsPreparingGuts(gxact, xid, gid, prepared_at, owner, databaseid);
 
+	gxact->infile = false;
 	gxact->ondisk = false;
 
 	/* And insert it into the active array */
@@ -1204,6 +1210,37 @@ EndPrepare(GlobalTransaction gxact)
 (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
  errmsg("two-phase state file maximum length exceeded")));
 
+#ifdef POSTGRESQL_TWOPHASE_SUPPORT_ASYNC_COMMIT
+
+	Assert(gxact->infile == false);
+
+	if (synchronous_commit == SYNCHRONOUS_COMMIT_OFF)
+	{
+		char		   *buf;
+		size_t			len = 0;
+		size_t			offset = 0;
+
+		for (record = records.head; record != NULL; record = record->next)
+			len += record->len;
+
+		if (len > 0)
+		{
+			buf = palloc(len);
+
+			for (record = records.head; record != NULL; record = record->next)
+			{
+memcpy(buf + offset, record->data, record->len);
+offset += record->len;
+			}
+
+			RecreateTwoPhaseFile(gxact->xid, buf, len, false);
+			pfree(buf);
+			gxact->infile = true;
+		}
+	}
+
+#endif
+
 	/*
 	 * Now writing 2PC state data to WAL. We let the WAL's CRC protection
 	 * cover us, so no need to calculate a separate CRC.
@@ -1239,8 +1276,24 @@ EndPrepare(GlobalTransaction gxact)
    gxact->prepare_end_lsn);
 	}
 
+#if !defined(POSTGRESQL_TWOPHASE_SUPPORT_ASYNC_COMMIT)
+
 	XLogFlush(gxact->prepare_end_lsn);
 
+#else
+
+	if (synchronous_commit > SYNCHRONOUS_COMMIT_OFF)
+	{
+		/* Flush XLOG to disk */
+		XLogFlush(gxact->prepare_end_lsn);
+	}
+	else
+	{
+		XLogSetAsyncXactLSN(gxact->prepare_end_lsn);
+	}
+
+#endif
+
 	/* If we crash now, we have prepared: WAL replay will fix things */
 
 	/* Store record's start location to read that later on Commit */
@@ -1546,12 +1599,11 @@ FinishPreparedTransaction(const char *gid, bool isCommit)
 	 * in WAL files if the LSN is after the last checkpoint record, or moved
 	 * to disk if for some reason they have lived for a long time.
 	 */
-	if (gxact->ondisk)
+	if (gxact->infile || gxact->ondisk)
 		buf = ReadTwoPhaseFile(xid, false);
 	else
 		XlogReadTwoPhaseData(gxact->prepare_start_lsn, , NULL);
 
-
 	/*
 	 * Disassemble the header area
 	 */
@@ 

Re: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-02-27 Thread Давыдов Виталий

Hi Amit,

On Tuesday, February 27, 2024 16:00 MSK, Amit Kapila  
wrote:
As we do XLogFlush() at the time of prepare then why it is not available? OR 
are you talking about this state after your idea/patch where you are trying to 
make both Prepare and Commit_prepared records async?Right, I'm talking about my 
patch where async commit is implemented. There is no such problem with reading 
2PC from not flushed WAL in the vanilla because XLogFlush is called 
unconditionally, as you've described. But an attempt to add some async stuff 
leads to the problem of reading not flushed WAL. It is why I store 2pc state in 
the local memory in my patch.
It would be good if you could link those threads.Sure, I will find and add some 
links to the discussions from past.

Thank you!

With best regards,
Vitaly
 On Tue, Feb 27, 2024 at 4:49 PM Давыдов Виталий
 wrote:
>
> Thank you for your interest in the discussion!
>
> On Monday, February 26, 2024 16:24 MSK, Amit Kapila  
> wrote:
>
>
> I think the reason is probably that when the WAL record for prepared is 
> already flushed then what will be the idea of async commit here?
>
> I think, the idea of async commit should be applied for both transactions: 
> PREPARE and COMMIT PREPARED, which are actually two separate local 
> transactions. For both these transactions we may call XLogSetAsyncXactLSN on 
> commit instead of XLogFlush when async commit is enabled. When I use async 
> commit, I mean to apply async commit to local transactions, not to a twophase 
> (prepared) transaction itself.
>
>
> At commit prepared, it seems we read prepare's WAL record, right? If so, it 
> is not clear to me do you see a problem with a flush of commit_prepared or 
> reading WAL for prepared or both of these.
>
> The problem with reading WAL is due to async commit of PREPARE TRANSACTION 
> which saves 2PC in the WAL. At the moment of COMMIT PREPARED the WAL with 
> PREPARE TRANSACTION 2PC state may not be XLogFlush-ed yet.
>

As we do XLogFlush() at the time of prepare then why it is not
available? OR are you talking about this state after your idea/patch
where you are trying to make both Prepare and Commit_prepared records
async?

So, PREPARE TRANSACTION should wait until its 2PC state is flushed.
>
> I did some experiments with saving 2PC state in the local memory of logical 
> replication worker and, I think, it worked and demonstrated much better 
> performance. Logical replication worker utilized up to 100% CPU. I'm just 
> concerned about possible problems with async commit for twophase transactions.
>
> To be more specific, I've attached a patch to support async commit for 
> twophase. It is not the final patch but it is presented only for discussion 
> purposes. There were some attempts to save 2PC in memory in past but it was 
> rejected.
>

It would be good if you could link those threads.

--
With Regards,
Amit Kapila.

 

 


Re: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-02-27 Thread Amit Kapila
On Tue, Feb 27, 2024 at 4:49 PM Давыдов Виталий
 wrote:
>
> Thank you for your interest in the discussion!
>
> On Monday, February 26, 2024 16:24 MSK, Amit Kapila  
> wrote:
>
>
> I think the reason is probably that when the WAL record for prepared is 
> already flushed then what will be the idea of async commit here?
>
> I think, the idea of async commit should be applied for both transactions: 
> PREPARE and COMMIT PREPARED, which are actually two separate local 
> transactions. For both these transactions we may call XLogSetAsyncXactLSN on 
> commit instead of XLogFlush when async commit is enabled. When I use async 
> commit, I mean to apply async commit to local transactions, not to a twophase 
> (prepared) transaction itself.
>
>
> At commit prepared, it seems we read prepare's WAL record, right? If so, it 
> is not clear to me do you see a problem with a flush of commit_prepared or 
> reading WAL for prepared or both of these.
>
> The problem with reading WAL is due to async commit of PREPARE TRANSACTION 
> which saves 2PC in the WAL. At the moment of COMMIT PREPARED the WAL with 
> PREPARE TRANSACTION 2PC state may not be XLogFlush-ed yet.
>

As we do XLogFlush() at the time of prepare then why it is not
available? OR are you talking about this state after your idea/patch
where you are trying to make both Prepare and Commit_prepared records
async?

 So, PREPARE TRANSACTION should wait until its 2PC state is flushed.
>
> I did some experiments with saving 2PC state in the local memory of logical 
> replication worker and, I think, it worked and demonstrated much better 
> performance. Logical replication worker utilized up to 100% CPU. I'm just 
> concerned about possible problems with async commit for twophase transactions.
>
> To be more specific, I've attached a patch to support async commit for 
> twophase. It is not the final patch but it is presented only for discussion 
> purposes. There were some attempts to save 2PC in memory in past but it was 
> rejected.
>

It would be good if you could link those threads.

-- 
With Regards,
Amit Kapila.




Re: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-02-27 Thread Давыдов Виталий

Hi Amit,

Thank you for your interest in the discussion!

On Monday, February 26, 2024 16:24 MSK, Amit Kapila  
wrote:
 
I think the reason is probably that when the WAL record for prepared is already 
flushed then what will be the idea of async commit here?I think, the idea of 
async commit should be applied for both transactions: PREPARE and COMMIT 
PREPARED, which are actually two separate local transactions. For both these 
transactions we may call XLogSetAsyncXactLSN on commit instead of XLogFlush 
when async commit is enabled. When I use async commit, I mean to apply async 
commit to local transactions, not to a twophase (prepared) transaction itself.
 
At commit prepared, it seems we read prepare's WAL record, right? If so, it is 
not clear to me do you see a problem with a flush of commit_prepared or reading 
WAL for prepared or both of these.The problem with reading WAL is due to async 
commit of PREPARE TRANSACTION which saves 2PC in the WAL. At the moment of 
COMMIT PREPARED the WAL with PREPARE TRANSACTION 2PC state may not be 
XLogFlush-ed yet. So, PREPARE TRANSACTION should wait until its 2PC state is 
flushed.

I did some experiments with saving 2PC state in the local memory of logical 
replication worker and, I think, it worked and demonstrated much better 
performance. Logical replication worker utilized up to 100% CPU. I'm just 
concerned about possible problems with async commit for twophase transactions.

To be more specific, I've attached a patch to support async commit for 
twophase. It is not the final patch but it is presented only for discussion 
purposes. There were some attempts to save 2PC in memory in past but it was 
rejected. Now, there might be the second round to discuss it.

With best regards,
Vitaly

 
From 549f809fa122ca0842ec4bfc775afd08feee0d80 Mon Sep 17 00:00:00 2001
From: Vitaly Davydov 
Date: Tue, 27 Feb 2024 14:02:23 +0300
Subject: [PATCH] Add asynchronous commit support for 2PC

---
 src/backend/access/transam/twophase.c | 111 +-
 1 file changed, 108 insertions(+), 3 deletions(-)

diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c
index c6af8cfd7e..52f0853db8 100644
--- a/src/backend/access/transam/twophase.c
+++ b/src/backend/access/transam/twophase.c
@@ -109,6 +109,8 @@
 #include "utils/memutils.h"
 #include "utils/timestamp.h"
 
+#define POSTGRESQL_TWOPHASE_SUPPORT_ASYNC_COMMIT
+
 /*
  * Directory where Two-phase commit files reside within PGDATA
  */
@@ -163,6 +165,9 @@ typedef struct GlobalTransactionData
 	 */
 	XLogRecPtr	prepare_start_lsn;	/* XLOG offset of prepare record start */
 	XLogRecPtr	prepare_end_lsn;	/* XLOG offset of prepare record end */
+	void*   prepare_2pc_mem_data;
+	size_t  prepare_2pc_mem_len;
+	pid_t   prepare_2pc_proc;
 	TransactionId xid;			/* The GXACT id */
 
 	Oid			owner;			/* ID of user that executed the xact */
@@ -427,6 +432,9 @@ MarkAsPreparing(TransactionId xid, const char *gid,
 
 	MarkAsPreparingGuts(gxact, xid, gid, prepared_at, owner, databaseid);
 
+	Assert(gxact->prepare_2pc_mem_data == NULL);
+	Assert(gxact->prepare_2pc_proc == 0);
+
 	gxact->ondisk = false;
 
 	/* And insert it into the active array */
@@ -1129,6 +1137,8 @@ StartPrepare(GlobalTransaction gxact)
 	}
 }
 
+extern bool IsLogicalWorker(void);
+
 /*
  * Finish preparing state data and writing it to WAL.
  */
@@ -1167,6 +1177,37 @@ EndPrepare(GlobalTransaction gxact)
 (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
  errmsg("two-phase state file maximum length exceeded")));
 
+	Assert(gxact->prepare_2pc_mem_data == NULL);
+	Assert(gxact->prepare_2pc_proc == 0);
+
+	if (IsLogicalWorker())
+	{
+		size_t			len = 0;
+		size_t			offset = 0;
+
+		for (record = records.head; record != NULL; record = record->next)
+			len += record->len;
+
+		if (len > 0)
+		{
+			MemoryContext	oldmemctx;
+
+			oldmemctx = MemoryContextSwitchTo(TopMemoryContext);
+
+			gxact->prepare_2pc_mem_data = palloc(len);
+			gxact->prepare_2pc_mem_len = len;
+			gxact->prepare_2pc_proc = getpid();
+
+			for (record = records.head; record != NULL; record = record->next)
+			{
+memcpy((char *)gxact->prepare_2pc_mem_data + offset, record->data, record->len);
+offset += record->len;
+			}
+
+			MemoryContextSwitchTo(oldmemctx);
+		}
+	}
+
 	/*
 	 * Now writing 2PC state data to WAL. We let the WAL's CRC protection
 	 * cover us, so no need to calculate a separate CRC.
@@ -1202,8 +1243,24 @@ EndPrepare(GlobalTransaction gxact)
    gxact->prepare_end_lsn);
 	}
 
+#if !defined(POSTGRESQL_TWOPHASE_SUPPORT_ASYNC_COMMIT)
+
 	XLogFlush(gxact->prepare_end_lsn);
 
+#else
+
+	if (synchronous_commit > SYNCHRONOUS_COMMIT_OFF)
+	{
+		/* Flush XLOG to disk */
+		XLogFlush(gxact->prepare_end_lsn);
+	}
+	else
+	{
+		XLogSetAsyncXactLSN(gxact->prepare_end_lsn);
+	}
+
+#endif
+
 	/* If we crash now, we have prepared: WAL replay will fix things */
 
 	/* Store record's start location to read that later on Commit */
@@ 

Re: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-02-26 Thread Amit Kapila
On Fri, Feb 23, 2024 at 10:41 PM Давыдов Виталий
 wrote:
>
> Amit Kapila  wrote:
>
> I don't see we do anything specific for 2PC transactions to make them behave 
> differently than regular transactions with respect to synchronous_commit 
> setting. What makes you think so? Can you pin point the code you are 
> referring to?
>
> Yes, sure. The function RecordTransactionCommitPrepared is called on prepared 
> transaction commit (twophase.c). It calls XLogFlush unconditionally. The 
> function RecordTransactionCommit (for regular transactions, xact.c) calls 
> XLogFlush if synchronous_commit > OFF, otherwise it calls XLogSetAsyncXactLSN.
>
> There is some comment in RecordTransactionCommitPrepared (by Bruce Momjian) 
> that shows that async commit is not supported yet:
> /*
> * We don't currently try to sleep before flush here ... nor is there any
> * support for async commit of a prepared xact (the very idea is probably
> * a contradiction)
> */
> /* Flush XLOG to disk */
> XLogFlush(recptr);
>

It seems this comment is added in the commit 4a78cdeb where we added
async commit support. I think the reason is probably that when the WAL
record for prepared is already flushed then what will be the idea of
async commit here?

> Right, I think for this we need to implement parallel apply.
>
> Yes, parallel apply is a good point. But, I believe, it will not work if 
> asynchronous commit is not supported. You have only one receiver process 
> which should dispatch incoming messages to parallel workers. I guess, you 
> will never reach such rate of parallel execution on replica as on the master 
> with multiple backends.
>
>
> Can you be a bit more specific about what exactly you have in mind to achieve 
> the above solutions?
>
> My proposal is to implement async commit for 2PC transactions as it is for 
> regular transactions. It should significantly speedup the catchup process. 
> Then, think how to apply in parallel, which is much diffcult to do. The 
> current problem is to get 2PC state from the WAL on commit prepared. At this 
> moment, the WAL is not flushed yet, commit function waits until WAL with 2PC 
> state is to be flushed. I just tried to do it in my sandbox and found such a 
> problem. Inability to get 2PC state from unflushed WAL stops me right now. I 
> think about possible solutions.
>

At commit prepared, it seems we read prepare's WAL record, right? If
so, it is not clear to me do you see a problem with a flush of
commit_prepared or reading WAL for prepared or both of these.

-- 
With Regards,
Amit Kapila.




Re: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-02-23 Thread Давыдов Виталий

Hi Amit,
Amit Kapila  wrote:
I don't see we do anything specific for 2PC transactions to make them behave 
differently than regular transactions with respect to synchronous_commit 
setting. What makes you think so? Can you pin point the code you are referring 
to?Yes, sure. The function RecordTransactionCommitPrepared is called on 
prepared transaction commit (twophase.c). It calls XLogFlush unconditionally. 
The function RecordTransactionCommit (for regular transactions, xact.c) calls 
XLogFlush if synchronous_commit > OFF, otherwise it calls XLogSetAsyncXactLSN.

There is some comment in RecordTransactionCommitPrepared (by Bruce Momjian) 
that shows that async commit is not supported yet:
/*
* We don't currently try to sleep before flush here ... nor is there any
* support for async commit of a prepared xact (the very idea is probably
* a contradiction)
*/
/* Flush XLOG to disk */
XLogFlush(recptr);
Right, I think for this we need to implement parallel apply.Yes, parallel apply 
is a good point. But, I believe, it will not work if asynchronous commit is not 
supported. You have only one receiver process which should dispatch incoming 
messages to parallel workers. I guess, you will never reach such rate of 
parallel execution on replica as on the master with multiple backends.
 
Can you be a bit more specific about what exactly you have in mind to achieve 
the above solutions?My proposal is to implement async commit for 2PC 
transactions as it is for regular transactions. It should significantly speedup 
the catchup process. Then, think how to apply in parallel, which is much 
diffcult to do. The current problem is to get 2PC state from the WAL on commit 
prepared. At this moment, the WAL is not flushed yet, commit function waits 
until WAL with 2PC state is to be flushed. I just tried to do it in my sandbox 
and found such a problem. Inability to get 2PC state from unflushed WAL stops 
me right now. I think about possible solutions.

The idea with enableFsync is not a suitable solution, in general, I think. I 
just pointed it as an alternate idea. You just do enableFsync = false before 
prepare or commit prepared and do enableFsync = true after these functions. In 
this case, 2PC records will not be fsync-ed, but FlushPtr will be increased. 
Thus, 2PC state can be read from WAL on commit prepared without waiting. To 
make it work correctly, I guess, we have to do some additional work to keep 
more wal on the master and filter some duplicate transactions on the replica, 
if replica restarts during catchup.
​​
With best regards,
​​Vitaly Davydov

 


Re: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-02-23 Thread Давыдов Виталий

Hi Ajin,

Thank you for your feedback. Could you please try to increase the number of 
clients (-c pgbench option) up to 20 or more? It seems, I forgot to specify it.

With best regards,
Vitaly Davydov On Fri, Feb 23, 2024 at 12:29 AM Давыдов Виталий 
 wrote:
Dear All,
I'd like to present and talk about a problem when 2PC transactions are applied 
quite slowly on a replica during logical replication. There is a master and a 
replica with established logical replication from the master to the replica 
with twophase = true. With some load level on the master, the replica starts to 
lag behind the master, and the lag will be increasing. We have to significantly 
decrease the load on the master to allow replica to complete the catchup. Such 
problem may create significant difficulties in the production. The problem 
appears at least on REL_16_STABLE branch.
To reproduce the problem:
 * Setup logical replication from master to replica with subscription parameter 
twophase =  true. * Create some intermediate load on the master (use pgbench 
with custom sql with prepare+commit) * Optionally switch off the replica for 
some time (keep load on master). * Switch on the replica and wait until it 
reaches the master.
The replica will never reach the master with even some low load on the master. 
If to remove the load, the replica will reach the master for much greater time, 
than expected. I tried the same for regular transactions, but such problem 
doesn't appear even with a decent load.
  I tried this setup and I do see that the logical subscriber does reach the 
master in a short time. I'm not sure what I'm missing. I stopped the logical 
subscriber in between while pgbench was running and then started it again and 
ran the following:postgres=# SELECT sent_lsn, pg_current_wal_lsn() FROM 
pg_stat_replication;
 sent_lsn  | pg_current_wal_lsn
---+
 0/6793FA0 | 0/6793FA0 <=== caught up
(1 row)
 My pgbench command:pgbench postgres -p 6972 -c 2 -j 3 -f /home/ajin/test.sql 
-T 200 -P 5 my custom sql file:cat test.sql
SELECT md5(random()::text) as mygid \gset
BEGIN;
DELETE FROM test WHERE v = pg_backend_pid();
INSERT INTO test(v) SELECT pg_backend_pid();
PREPARE TRANSACTION $$:mygid$$;
COMMIT PREPARED $$:mygid$$; regards,Ajin CherianFujitsu Australia 

 


Re: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-02-22 Thread Ajin Cherian
On Fri, Feb 23, 2024 at 12:29 AM Давыдов Виталий 
wrote:

> Dear All,
>
> I'd like to present and talk about a problem when 2PC transactions are
> applied quite slowly on a replica during logical replication. There is a
> master and a replica with established logical replication from the master
> to the replica with twophase = true. With some load level on the master,
> the replica starts to lag behind the master, and the lag will be
> increasing. We have to significantly decrease the load on the master to
> allow replica to complete the catchup. Such problem may create significant
> difficulties in the production. The problem appears at least on
> REL_16_STABLE branch.
>
> To reproduce the problem:
>
>- Setup logical replication from master to replica with subscription
>parameter twophase =  true.
>- Create some intermediate load on the master (use pgbench with custom
>sql with prepare+commit)
>- Optionally switch off the replica for some time (keep load on
>master).
>- Switch on the replica and wait until it reaches the master.
>
> The replica will never reach the master with even some low load on the
> master. If to remove the load, the replica will reach the master for much
> greater time, than expected. I tried the same for regular transactions, but
> such problem doesn't appear even with a decent load.
>
>
>
I tried this setup and I do see that the logical subscriber does reach the
master in a short time. I'm not sure what I'm missing. I stopped the
logical subscriber in between while pgbench was running and then started it
again and ran the following:
postgres=# SELECT sent_lsn, pg_current_wal_lsn() FROM pg_stat_replication;
 sent_lsn  | pg_current_wal_lsn
---+
 0/6793FA0 | 0/6793FA0 <=== caught up
(1 row)

My pgbench command:
pgbench postgres -p 6972 -c 2 -j 3 -f /home/ajin/test.sql -T 200 -P 5

my custom sql file:
cat test.sql
SELECT md5(random()::text) as mygid \gset
BEGIN;
DELETE FROM test WHERE v = pg_backend_pid();
INSERT INTO test(v) SELECT pg_backend_pid();
PREPARE TRANSACTION $$:mygid$$;
COMMIT PREPARED $$:mygid$$;

regards,
Ajin Cherian
Fujitsu Australia


Re: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-02-22 Thread Amit Kapila
On Thu, Feb 22, 2024 at 6:59 PM Давыдов Виталий
 wrote:
>
> I'd like to present and talk about a problem when 2PC transactions are 
> applied quite slowly on a replica during logical replication. There is a 
> master and a replica with established logical replication from the master to 
> the replica with twophase = true. With some load level on the master, the 
> replica starts to lag behind the master, and the lag will be increasing. We 
> have to significantly decrease the load on the master to allow replica to 
> complete the catchup. Such problem may create significant difficulties in the 
> production. The problem appears at least on REL_16_STABLE branch.
>
> To reproduce the problem:
>
> Setup logical replication from master to replica with subscription parameter 
> twophase =  true.
> Create some intermediate load on the master (use pgbench with custom sql with 
> prepare+commit)
> Optionally switch off the replica for some time (keep load on master).
> Switch on the replica and wait until it reaches the master.
>
> The replica will never reach the master with even some low load on the 
> master. If to remove the load, the replica will reach the master for much 
> greater time, than expected. I tried the same for regular transactions, but 
> such problem doesn't appear even with a decent load.
>
> I think, the main proplem of 2PC catchup bad performance - the lack of 
> asynchronous commit support for 2PC. For regular transactions asynchronous 
> commit is used on the replica by default (subscrition sycnronous_commit = 
> off). It allows the replication worker process on the replica to avoid fsync 
> (XLogFLush) and to utilize 100% CPU (the background wal writer or 
> checkpointer will do fsync). I agree, 2PC are mostly used in multimaster 
> configurations with two or more nodes which are performed synchronously, but 
> when the node in catchup (node is not online in a multimaster cluster), 
> asynchronous commit have to be used to speedup the catchup.
>

I don't see we do anything specific for 2PC transactions to make them
behave differently than regular transactions with respect to
synchronous_commit setting. What makes you think so? Can you pin point
the code you are referring to?

> There is another thing that affects on the disbalance of the master and 
> replica performance. When the master executes requestes from multiple 
> clients, there is a fsync optimization takes place in XLogFlush. It allows to 
> decrease the number of fsync in case when a number of parallel backends write 
> to the WAL simultaneously. The replica applies received transactions in one 
> thread sequentially, such optimization is not applied.
>

Right, I think for this we need to implement parallel apply.

> I see some possible solutions:
>
> Implement asyncronous commit for 2PC transactions.
> Do some hacking with enableFsync when it is possible.
>

Can you be a bit more specific about what exactly you have in mind to
achieve the above solutions?

-- 
With Regards,
Amit Kapila.




Slow catchup of 2PC (twophase) transactions on replica in LR

2024-02-22 Thread Давыдов Виталий

Dear All,
I'd like to present and talk about a problem when 2PC transactions are applied 
quite slowly on a replica during logical replication. There is a master and a 
replica with established logical replication from the master to the replica 
with twophase = true. With some load level on the master, the replica starts to 
lag behind the master, and the lag will be increasing. We have to significantly 
decrease the load on the master to allow replica to complete the catchup. Such 
problem may create significant difficulties in the production. The problem 
appears at least on REL_16_STABLE branch.
To reproduce the problem:
 * Setup logical replication from master to replica with subscription parameter 
twophase =  true. * Create some intermediate load on the master (use pgbench 
with custom sql with prepare+commit) * Optionally switch off the replica for 
some time (keep load on master). * Switch on the replica and wait until it 
reaches the master.
The replica will never reach the master with even some low load on the master. 
If to remove the load, the replica will reach the master for much greater time, 
than expected. I tried the same for regular transactions, but such problem 
doesn't appear even with a decent load.
I think, the main proplem of 2PC catchup bad performance - the lack of 
asynchronous commit support for 2PC. For regular transactions asynchronous 
commit is used on the replica by default (subscrition sycnronous_commit = off). 
It allows the replication worker process on the replica to avoid fsync 
(XLogFLush) and to utilize 100% CPU (the background wal writer or checkpointer 
will do fsync). I agree, 2PC are mostly used in multimaster configurations with 
two or more nodes which are performed synchronously, but when the node in 
catchup (node is not online in a multimaster cluster), asynchronous commit have 
to be used to speedup the catchup.
There is another thing that affects on the disbalance of the master and replica 
performance. When the master executes requestes from multiple clients, there is 
a fsync optimization takes place in XLogFlush. It allows to decrease the number 
of fsync in case when a number of parallel backends write to the WAL 
simultaneously. The replica applies received transactions in one thread 
sequentially, such optimization is not applied.
I see some possible solutions:
 * Implement asyncronous commit for 2PC transactions. * Do some hacking with 
enableFsync when it is possible.
I think, asynchronous commit support for 2PC transactions should significantly 
increase replica performance and help to solve this problem. I tried to 
implement it (like for usual transactions) but I've found another problem: 2PC 
state is stored in WAL on prepare, on commit we have to read 2PC state from WAL 
but the read is delayed until WAL is flushed by the background wal writer (read 
LSN should be less than flush LSN). Storing 2PC state in a shared memory (as it 
proposed earlier) may help.

I used the following query to monitor the catchup progress on the master:SELECT 
sent_lsn, pg_current_wal_lsn() FROM pg_stat_replication;
I used the following script for pgbench to the master:SELECT 
md5(random()::text) as mygid \gset
BEGIN;
DELETE FROM test WHERE v = pg_backend_pid();
INSERT INTO test(v) SELECT pg_backend_pid();
PREPARE TRANSACTION $$:mygid$$;
COMMIT PREPARED $$:mygid$$;
 
What do you think?
 
With best regards,
Vitaly Davydov