On Thu, 27 Nov 2025 at 02:32, Masahiko Sawada <[email protected]> wrote:
>
> On Wed, Nov 26, 2025 at 3:47 AM Amit Kapila <[email protected]> wrote:
> >
> > On Wed, Nov 26, 2025 at 10:31 AM Amit Kapila <[email protected]> 
> > wrote:
> > >
> > > On Wed, Nov 26, 2025 at 5:59 AM Masahiko Sawada <[email protected]> 
> > > wrote:
> > > >
> > > >
> > > > After thinking of this case, I'm concerned that we would hit the
> > > > existing similar assertion failures or assertions that future changes
> > > > could introduce because the value returned by XLogLogicalInfoActive()
> > > > could be changed even within the same transaction whereas the result
> > > > from XLogStandbyInfoActive() doesn't. One possible change would be to
> > > > ensure that XLogLogicalInfoActive() returns the same result within the
> > > > same transaction. It would be more straightforward and closer to the
> > > > current GUC-based behavior.
> > > >
> > >
> > > Agreed that this is worth considering. So, the only downside it could
> > > have is the performance impact where it is not used like when
> > > wal_level is replica and the user didn't create any logical slot. So,
> > > short transactions where we do call XLogLogicalInfoActive() directly
> > > or indirectly only once or twice, we always need to invoke
> > > CheckXLogLogicalInfoSlow(). Can we try some workloads to see if there
> > > is any noticeable impact?
> > >
> >
> > One simple test could be: Begin; Select pg_current_xact_id(); Commit;
> > This will let newly added function twice in each transaction.
>
> I've done a simple performance benchmark with and without this patch.
> There seems to be no noticeable performance regression.
>
> w/o patch:
> transaction type: /tmp/test.sql
> scaling factor: 1
> query mode: simple
> number of clients: 1
> number of threads: 1
> maximum number of tries: 1
> duration: 180 s
> number of transactions actually processed: 3913028
> number of failed transactions: 0 (0.000%)
> latency average = 0.046 ms
> latency stddev = 0.012 ms
> initial connection time = 2.077 ms
> tps = 21739.289857 (without initial connection time)
>
> w/ patch:
> transaction type: /tmp/test.sql
> scaling factor: 1
> query mode: simple
> number of clients: 1
> number of threads: 1
> maximum number of tries: 1
> duration: 180 s
> number of transactions actually processed: 3922125
> number of failed transactions: 0 (0.000%)
> latency average = 0.046 ms
> latency stddev = 0.012 ms
> initial connection time = 2.049 ms
> tps = 21789.823021 (without initial connection time)
>
> >
> > Few comments on 0001:
> > =====================
> > 1.
> > <literal>wal_level_insufficient</literal> means that the
> > -          primary doesn't have a <xref linkend="guc-wal-level"/> 
> > sufficient to
> > -          perform logical decoding.  It is set only for logical slots.
> > +          logical decoding is disabled on the primary (See
> > +          <xref linkend="logicaldecoding-explanation-log-dec"/>). It is set
> > +          only for logical slots.
> >
> > Can't we use the existing description and use effective-wal-level
> > instead of wal-level? We can provide the link you mentioned.
>
> Agreed.
>
> >
> > 2.
> >   * Skip this if we're taking a full-page image of the new page, as we
> >   * don't include the new tuple in the WAL record in that case.  Also
> > - * disable if wal_level='logical', as logical decoding needs to be able to
> > - * read the new tuple in whole from the WAL record alone.
> > + * disable if logical decoding is enabled and the relation requires WAL to
> > + * be logged for logical decoding, as it needs to be able to read the new
> > + * tuple in whole from the WAL record alone.
> >   */
> >   if (oldbuf == newbuf && !need_tuple_data &&
> >   !XLogCheckBufferNeedsBackup(newbuf))
> > @@ -9064,8 +9065,8 @@ log_heap_update(Relation reln, Buffer oldbuf,
> >  /*
> >   * Perform XLogInsert of an XLOG_HEAP2_NEW_CID record
> >   *
> > - * This is only used in wal_level >= WAL_LEVEL_LOGICAL, and only for 
> > catalog
> > - * tuples.
> > + * This is only used when effective_wal_level is logical, and only for
> > + * catalog tuples.
> >
> > In the first comment, you have used "logical decoding is enabled," and
> > in the second one, you have used "effective_wal_level is logical." Can
> > we effective_wal_level is logical" for consistency sake, and it sounds
> > a bit easier to follow?
>
> Agreed.
>
> >
> > 3.
> > Check the shared
> > + * status instead of the local XLogLogicalInfo because XLogLogicalInfo is
> > + * not updated synchronously during recovery.
> > + */
> > + if (RecoveryInProgress())
> > + return IsXLogLogicalInfoEnabled() ? "logical" : "replica";
> >
> > Is it true only during recovery or in general also? Say a checkpointer
> > disables it, then won't backends see it asynchronously?
>
> I meant that we don't update the process-local cache, XLogLogicalInfo,
> upon replaying the STATUS_CHANGE record during recovery, so we need to
> check the shared flag instead. During non-recovery, I think that
> processes should check its local cache because they're working based
> on their cache.
>
> >
> > 4.
> > + errhint("Before creating subscriptions, set \"wal_level\" >= 
> > \"replica\"")));
> >
> > errhint should always end with a full stop. Also, how about a slightly
> > modified hint message like: "Before creating subscriptions, ensure
> > that wal_level is set to replica or higher.”?
> >
> > 5.
> > This design choice exists because deactivation requires waiting
> > + * for concurrent attempts to update the logical decoding status, which 
> > can be
> > + * problematic when the process is holding interrupts.
> >
> > The part of the comment that says "concurrent attempts to update the
> > logical decoding status" is not clear to me. Which concurrent attempts
> > are you referring to here?
>
> Fixed the above points.
>
> I've squashed all fixup patches and attached the updated patch.

Few comments:
1) There is no validation after creation of temporary logical
replication slot, can we see if  "logical decoding is enabled upon
creating a new logical replication slot" is logged as a validation for
this:
+# Create a temporary logical slot but exits without releasing it explicitly.
+# This enables logical decoding but skips disabling it and delegates to the
+# checkpointer.
+$primary->safe_psql('postgres',
+       qq[select pg_create_logical_replication_slot('test_tmp_slot',
'test_decoding', true)]
+);
+
+# Wait for the checkpointer to disable logical decoding.
+wait_for_logical_decoding_disabled($primary);

I had a look at the header file inclusions in the patch and found few
issues with it:
2) Here logicalctl.h should be included before logicallauncher.h:
 #include "postmaster/interrupt.h"
 #include "replication/logicallauncher.h"
+#include "replication/logicalctl.h"
 #include "replication/slotsync.h"

3) I was able to compile without inclusion of logicalctl.h, may be it
is not required because we include origin.h which includes xlog.h
which includes logicalctl.h already:
diff --git a/src/backend/access/transam/xlog.c
b/src/backend/access/transam/xlog.c
index 22d0a2e8c3a..6f3f10af8fd 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -79,7 +79,9 @@
 #include "postmaster/walsummarizer.h"
 #include "postmaster/walwriter.h"
 #include "replication/origin.h"
+#include "replication/logicalctl.h"

4) Similarly it is not required in checkpointer.c, logical.c,
logicalctl.c, slotsync.c, slot.c, slotfuncs.c, walsender.c, ipci.c,
procsignal.c, standby.c and postinit.c

5) Few comments in the test file exceeds 80 chars, if possible we can
move it to the next line:
5.a) +# Create and drop another logical slot, then check if
effective_wal_level remains
+# 'logical'.
5.b) +# Add other settings to test if we disable logical decoding when
invalidating the last
+# logical slot.
5.c) # Check if logical decoding is disabled after invalidating the
last logical slot.
5.d) # Check if effective_wal_level is increased to 'logical' on the
cascaded standby.
5.e) # Check that the logical decoding is not enabled on the standby4.
Note that it still has
# the invalidated logical slot.
5.f) # Restart the primary with setting wal_level = 'logical' and
create a new logical
# slot.
5.g) # Drop the logical slot, requesting to disable logical decoding
to the checkpointer.
5.h) # Test the abort process of logical decoding activation. We drop
the primary's
# slot to decrease its effective_wal_level to 'replica'.

Regards,
Vignesh


Reply via email to