On Fri, Jan 23, 2026 at 2:04 AM Masahiko Sawada <[email protected]> wrote: > > On Wed, Jan 21, 2026 at 7:49 PM shveta malik <[email protected]> wrote: > > > > On Thu, Jan 22, 2026 at 5:19 AM Masahiko Sawada <[email protected]> > > wrote: > > > > > > On Mon, Jan 19, 2026 at 10:38 PM shveta malik <[email protected]> > > > wrote: > > > > > > > > On Wed, Jan 14, 2026 at 11:24 PM Masahiko Sawada > > > > <[email protected]> wrote: > > > > > > > > > > I've attached the updated patch. > > > > > > > > > > > > > Thank You for the patch. I like the idea of optimization. Few initial > > > > comments: > > > > > > Thank you for reviewing the patch! > > > > > > > > > > > 1) > > > > + * The query returns the slot names and their caught-up status in > > > > + * the same order as the results collected by > > > > + * get_old_cluster_logical_slot_infos(). If this query is changed, > > > > > > > > I could not find the function get_old_cluster_logical_slot_infos(), do > > > > you mean get_old_cluster_logical_slot_infos_query()? > > > > > > It seems an oversight in commit 6d3d2e8e541f0. I think it should be > > > get_db_rel_and_slot_infos(). > > > > > > > > > > > 2) > > > > " WHERE database = current_database() AND " > > > > " slot_type = 'logical' AND " > > > > > > > > Is there a reason why database = current_database() is placed before > > > > slot_type = 'logical'? I am not sure how the PostgreSQL optimizer and > > > > executor will order these predicates, but from the first look, > > > > slot_type = 'logical' appears cheaper and could be placed first, > > > > consistent with the ordering used at other places. > > > > > > Changed. > > > > > > > > > > > 3) > > > > Shouldn’t we add a sanity check inside > > > > get_old_cluster_logical_slot_infos_query() to ensure that when > > > > skip_caught_up_check is true, we are on PostgreSQL 18 or lower? This > > > > would make the function safer for future use if it's called elsewhere. > > > > I understand the caller already performs a similar check, but I think > > > > it's more appropriate here since we call > > > > binary_upgrade_logical_slot_has_caught_up() from inside, which doesn’t > > > > even exist on newer versions. > > > > > > What kind of sanity check did you mean? We can have a check with > > > pg_fatal() but it seems almost the same to me even if pg_upgrade fails > > > with an error due to missing > > > binary_upgrade_logical_slot_has_caught_up(). > > > > I was referring to a development-level sanity check, something like: > > > > /* skip_caught_up_check is required iff PG19 or newer */ > > Assert((GET_MAJOR_VERSION(cluster->major_version) >= 1900) == > > skip_caught_up_check); > > > > But performing this check requires access to the cluster version (or > > cluster information), which this function currently does not have. > > Given that, do you think it would make sense to pass the cluster as an > > argument to this function in order to perform the sanity check here? > > Hmm, I think it's better not to have the same check in multiple > places, but it might make sense to have > get_old_cluster_logical_slot_infos_query() decide whether to use the > fast method. I've updated the patch accordingly, please review it. >
Okay, looks good. Just one minor thing: + * Note that binary_upgrade_logical_slot_has_caught_up() is available only + * PG18 or older. For PG19 or newer *use_fast_caught_up_check should be + * set true, and use binary_upgrade_check_logical_slot_pending_wal() + * instead in the separate query (see slot_caught_up_info_query). Shall we tweak it slightly: * Note that binary_upgrade_logical_slot_has_caught_up() is available * only in PG18 and earlier. For PG19 and later, set *use_fast_caught_up_check * to true and use binary_upgrade_check_logical_slot_pending_wal() instead, * in a separate query (see slot_caught_up_info_query in get_db_rel_and_slot_infos()). > > > > > > > > > > 4) > > > > +# Check the file content. While both test_slot1 and test_slot2 should > > > > be reporting > > > > +# that they have unconsumed WAL records, test_slot3 should not be > > > > reported as > > > > +# it has caught up. > > > > > > > > Can you please elaborate the reason behind test_slot3 not being > > > > reported? Also mention in the comment if possible. > > > > > > We advance test_slot3 to the current WAL LSN before executing > > > pg_upgrade, so the test_slot3 should have consumed all pending WALs. > > > Please refer to the following changes: > > > > I understand the test, and the comments are clear to me. I also > > understand that only test_slot3 is expected to be in the caught-up > > state. My questions were specifically about the following points: > > 1) Why do we expect 'slot3 caught-up' not to be mentioned in the LOG? > > Is it simply because there is no corresponding logging in the code, or > > is this behavior related to some aspect of your fix that I may have > > missed? > > > > 2) In general, we do not validate the absence of LOG messages in > > tests. Why is this considered a special case where such a check is > > appropriate? > > What LOG do you refer to? In these tests, we check the > invalid_logical_slots.txt file where pg_upgrade reports only invalid > slots (in terms of pg_upgrade). For test_slot3, it should not be > mentioned in that file as it has caught up. Given that the file has > only invalid slots, checking the absence of test_slot3 in the file > makes sense to me. > Okay, I get the intent now. Thanks! Other than the comment suggestion above, the patch LGTM. thanks Shveta
