On Thu, Nov 27, 2025 at 4:59 AM Amit Kapila <[email protected]> wrote: > > On Thu, Nov 27, 2025 at 2:32 AM Masahiko Sawada <[email protected]> wrote: > > > > I've squashed all fixup patches and attached the updated patch. > > > > 1. > <literal>wal_level_insufficient</literal> means that the > - primary doesn't have a <xref linkend="guc-wal-level"/> sufficient > to > - perform logical decoding. It is set only for logical slots. > + primary doesn't have a <xref linkend="guc-effective-wal-level"/> > + to perform logical decoding. > > sufficient is missing after "guc-effective-wal-level" > > 2. > + * With 'minimal' WAL level, there are not logical replication slots > + * during recovery. > > /not/no. Typo > > 3. > case XLOG_LOGICAL_DECODING_STATUS_CHANGE: > { > - xl_parameter_change *xlrec = > - (xl_parameter_change *) XLogRecGetData(buf->record); > + bool logical_decoding; > > - /* > - * If wal_level on the primary is reduced to less than > - * logical, we want to prevent existing logical slots from > - * being used. Existing logical slots on the standby get > - * invalidated when this WAL record is replayed; and further, > - * slot creation fails when wal_level is not sufficient; but > - * all these operations are not synchronized, so a logical > - * slot may creep in while the wal_level is being reduced. > - * Hence this extra check. > - */ > - if (xlrec->wal_level < WAL_LEVEL_LOGICAL) > + memcpy(&logical_decoding, XLogRecGetData(buf->record), sizeof(bool)); > > The patch has entirely removed this comment but I feel we should write > something similar to it especially for the part: "Existing logical > slots on the standby get invalidated when this WAL record is replayed; > and further, slot creation fails when wal_level is not sufficient; but > all these operations are not synchronized, so a logical slot may creep > in while the wal_level is being reduced. Hence this extra check." Did > anything change about this part of the comment? > > 4. > WaitLSN "Waiting to read or update shared Wait-for-LSN state." > +LogicalDecodingControl "Waiting to access logical decoding status > information." > > Seeing the description just above, won't it be correct to say:"Waiting > to read or update logical decoding status information."?
Fixed the above points. > > 5. The newly added test took approximately 8s on my machine, whereas > other similar tests normally took 2-6s on the same machine, though > there are some exceptions, such as 035_standby_logical_decoding.pl. > See below results of some of the tests: > ------- > [10:03:37] t/028_pitr_timelines.pl ............... ok 2254 ms ( > 0.00 usr 0.00 sys + 0.39 cusr 0.83 csys = 1.22 CPU) > [10:03:39] t/029_stats_restart.pl ................ ok 2915 ms ( > 0.00 usr 0.00 sys + 0.34 cusr 0.42 csys = 0.76 CPU) > [10:03:42] t/030_stats_cleanup_replica.pl ........ ok 2282 ms ( > 0.00 usr 0.00 sys + 0.42 cusr 0.66 csys = 1.08 CPU) > [10:03:45] t/031_recovery_conflict.pl ............ ok 2705 ms ( > 0.00 usr 0.00 sys + 0.39 cusr 0.64 csys = 1.03 CPU) > [10:03:47] t/032_relfilenode_reuse.pl ............ ok 2611 ms ( > 0.01 usr 0.00 sys + 0.37 cusr 0.61 csys = 0.99 CPU) > [10:03:50] t/033_replay_tsp_drops.pl ............. ok 4860 ms ( > 0.00 usr 0.00 sys + 0.57 cusr 1.60 csys = 2.17 CPU) > [10:03:55] t/034_create_database.pl .............. ok 922 ms ( > 0.00 usr 0.00 sys + 0.19 cusr 0.19 csys = 0.38 CPU) > [10:03:56] t/035_standby_logical_decoding.pl ..... ok 10899 ms ( > 0.01 usr 0.00 sys + 1.13 cusr 2.21 csys = 3.35 CPU) > [10:04:07] t/036_truncated_dropped.pl ............ ok 1781 ms ( > 0.00 usr 0.00 sys + 0.21 cusr 0.22 csys = 0.43 CPU) > [10:04:09] t/037_invalid_database.pl ............. ok 944 ms ( > 0.00 usr 0.00 sys + 0.19 cusr 0.21 csys = 0.40 CPU) > [10:04:09] t/038_save_logical_slots_shutdown.pl .. ok 1562 ms ( > 0.00 usr 0.00 sys + 0.21 cusr 0.36 csys = 0.57 CPU) > [10:04:11] t/039_end_of_wal.pl ................... ok 4638 ms ( > 0.00 usr 0.00 sys + 0.48 cusr 0.66 csys = 1.14 CPU) > [10:04:16] t/040_standby_failover_slots_sync.pl .. ok 7418 ms ( > 0.01 usr 0.00 sys + 0.81 cusr 1.82 csys = 2.64 CPU) > [10:04:23] t/041_checkpoint_at_promote.pl ........ ok 1535 ms ( > 0.00 usr 0.00 sys + 0.29 cusr 0.51 csys = 0.80 CPU) > [10:04:25] t/042_low_level_backup.pl ............. ok 2842 ms ( > 0.00 usr 0.00 sys + 0.37 cusr 0.66 csys = 1.03 CPU) > [10:04:27] t/043_no_contrecord_switch.pl ......... ok 1946 ms ( > 0.00 usr 0.00 sys + 0.32 cusr 0.69 csys = 1.01 CPU) > [10:04:29] t/044_invalidate_inactive_slots.pl .... ok 603 ms ( > 0.00 usr 0.00 sys + 0.19 cusr 0.17 csys = 0.36 CPU) > [10:04:30] t/045_archive_restartpoint.pl ......... ok 4324 ms ( > 0.00 usr 0.00 sys + 0.97 cusr 0.66 csys = 1.63 CPU) > [10:04:34] t/046_checkpoint_logical_slot.pl ...... ok 3322 ms ( > 0.00 usr 0.00 sys + 0.33 cusr 0.55 csys = 0.88 CPU) > [10:04:38] t/047_checkpoint_physical_slot.pl ..... ok 1919 ms ( > 0.00 usr 0.00 sys + 0.28 cusr 0.43 csys = 0.71 CPU) > [10:04:40] t/048_vacuum_horizon_floor.pl ......... ok 1413 ms ( > 0.01 usr 0.00 sys + 0.26 cusr 0.53 csys = 0.80 CPU) > [10:04:41] t/049_wait_for_lsn.pl ................. ok 6851 ms ( > 0.00 usr 0.00 sys + 0.40 cusr 0.71 csys = 1.11 CPU) > [10:04:48] t/050_effective_wal_level.pl .......... ok 8106 ms ( > 0.00 usr 0.00 sys + 0.83 cusr 1.79 csys = 2.62 CPU) > --------- > > I haven't investigated to see if we can optimize or reduce the test > timing without impacting the coverage or functionality, but just see > if we can reduce it. If you think we can't do anything on this front > without compromising functionality coverage, then I think we can live > with it. I guess that we cannot avoid making this test heavy to some extent given that it involves multiple replication setup, standby promotions, and injection points etc. I've reduced several tests and I hope it helped reduce test duration on your env. It has been reduced a bit on my env but the test time is unstable. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
