> On Mar 23, 2026, at 16:41, Xuneng Zhou <[email protected]> wrote: > > Hi, > > On Mon, Mar 23, 2026 at 3:57 PM Chao Li <[email protected]> wrote: > > > > > > > > > On Mar 21, 2026, at 18:29, Xuneng Zhou <[email protected]> wrote: > > > > > > On Thu, Mar 19, 2026 at 1:07 PM Chao Li <[email protected]> wrote: > > >> > > >> > > >> > > >>> On Feb 26, 2026, at 14:59, Chao Li <[email protected]> wrote: > > >>> > > >>> > > >>> > > >>>> On Jan 28, 2026, at 10:49, Chao Li <[email protected]> wrote: > > >>>> > > >>>> > > >>>> > > >>>>> On Jan 27, 2026, at 16:30, Chao Li <[email protected]> wrote: > > >>>>> > > >>>>> > > >>>>> > > >>>>>> On Jan 27, 2026, at 15:59, Chao Li <[email protected]> wrote: > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>>> On Jan 27, 2026, at 15:39, Michael Paquier <[email protected]> > > >>>>>>> wrote: > > >>>>>>> > > >>>>>>> On Tue, Jan 27, 2026 at 01:13:32PM +0800, Chao Li wrote: > > >>>>>>>> I found this bug while working on a related patch [1]. > > >>>>>>>> > > >>>>>>>> When ALTER TABLE ... ALTER COLUMN TYPE causes an index rebuild, and > > >>>>>>>> that index is used as REPLICA IDENTITY on a partitioned table, the > > >>>>>>>> replica identity marking on partitions can be silently lost after > > >>>>>>>> the > > >>>>>>>> rebuild. > > >>>>>>> > > >>>>>>> I am slightly confused by the tests included in the proposed patch. > > >>>>>>> On HEAD, if I undo the proposed changes of tablecmds.c, the tests > > >>>>>>> pass. If I run the tests of the patch with the changes of > > >>>>>>> tablecmds.c, the tests also pass. > > >>>>>> > > >>>>>> Oops, that isn’t supposed to be so. I’ll check the test. > > >>>>>> > > >>>>> > > >>>>> Okay, I see the problem is here: > > >>>>> ``` > > >>>>> +CREATE UNIQUE INDEX test_replica_identity_partitioned_pkey ON > > >>>>> test_replica_identity_partitioned (id); > > >>>>> ``` > > >>>>> > > >>>>> I missed to add column “val” into the index, so that alter type of > > >>>>> val didn’t cause index rebuild. > > >>>>> > > >>>>> Ideally, it’s better to also verify that index OIDs should have > > >>>>> changed before and after alter column type, but I haven’t figured out > > >>>>> how to do so. Do you have an idea? > > >>>> > > >>>> I just updated the test to store index OIDs before and after rebuild > > >>>> into 2 temp tables, so that we can compare the OIDs to verify rebuild > > >>>> happens and replica identity preserved. > > >>>> > > >>>> I tried to port the test to master branch, and the test failed. From > > >>>> the test diff file, we can see replica identity lost on 3 leaf > > >>>> partitions: > > >>>> ``` > > >>>> @@ -360,9 +360,9 @@ > > >>>> ORDER BY b.index_name; > > >>>> index_name | rebuilt | ri_lost > > >>>> ---------------------------------------------------+---------+--------- > > >>>> - test_replica_identity_partitioned_p1_id_val_idx | t | f > > >>>> - test_replica_identity_partitioned_p2_1_id_val_idx | t | f > > >>>> - test_replica_identity_partitioned_p2_2_id_val_idx | t | f > > >>>> + test_replica_identity_partitioned_p1_id_val_idx | t | t > > >>>> + test_replica_identity_partitioned_p2_1_id_val_idx | t | t > > >>>> + test_replica_identity_partitioned_p2_2_id_val_idx | t | t > > >>>> test_replica_identity_partitioned_p2_id_val_idx | t | f > > >>>> test_replica_identity_partitioned_pkey | t | f > > >>>> (5 rows) > > >>>> ``` > > >>>> > > >>>> With this patch, the test passes and all replica identity are > > >>>> preserved. > > >>>> > > >>>> PFA v3: > > >>>> * Enhanced the test. > > >>>> * A small change in find_partition_replica_identity_indexes(): if we > > >>>> will not update a partition, then unlock it. > > >>>> > > >>>> Best regards, > > >>>> -- > > >>>> Chao Li (Evan) > > >>>> HighGo Software Co., Ltd. > > >>>> https://www.highgo.com/ > > >>>> > > >>>> > > >>>> > > >>>> > > >>>> <v3-0001-tablecmds-fix-bug-where-index-rebuild-loses-repli.patch> > > >>> > > >>> The CF asked for a rebase, thus rebased as v4. > > >>> > > > > > > > > > Hi, I reproduced this with the test case, and the patch appears > > > to resolve it. > > > > > > Some comments on v5: > > > > Thanks a lot for your review. > > > > > > > > -- Whether it makes sense to use a single list of pair structs instead > > > of two parallel OID lists (replicaIdentityIndexOids + > > > replicaIdentityTableOids) to avoid accidental desync. > > > > I don’t think that helps much. The current code of rebuilding index uses > > two lists changedIndexOids and changedIndexDefs. So, this patch matches the > > pattern of the existing code. > > > > > > > > -- It would be better to make lock handling in > > > find_partition_replica_identity_indexes() consistent > > > (relation_open(..., NoLock) if child is already locked, and avoid > > > mixed relation_close(..., lockmode)/NoLock behavior). > > > > That’s because if we are going to update a partition, then we need to hold > > the lock on the partition. > > There is one locking cleanup in find_partition_replica_identity_indexes(). > > find_inheritance_children(relId, lockmode) already acquires lockmode on > every partition it returns, so I think the later relation_open() should use > NoLock, not lockmode. For the same reason, all relation_close() calls in > this function should use NoLock as well. > > Today the code does: > > partRel =relation_open(partRelOid, lockmode); > ... > relation_close(partRel, lockmode); > > That does not cause a correctness issue, because the lock manager > reference-counts same-transaction acquisitions, so the lock remains held > either way. But it is misleading: it suggests that relation_open() is where > the partition lock is taken, and that the early relation_close(..., > lockmode) > is intentionally releasing it. Neither is actually true here, because the > lock > was already acquired by find_inheritance_children(). > > So I think this should be adjusted to: > > partRel = relation_open(partRelOid, NoLock); > > and all close sites in this function should be: > > relation_close(partRel, NoLock); > > The comment on the early-close path should also be updated, since it is not > really unlocking the partition. Something like "No matching partition index; > just close the relcache entry" would match the actual behavior better. >
Okay, in find_partition_replica_identity_indexes, we can use NOLOCK to open partitions as they have been locked by find_inheritance_children. But for those partitions that we won’t touch, we still want to unlock them. PFA v7. Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/
v7-0001-tablecmds-fix-bug-where-index-rebuild-loses-repli.patch
Description: Binary data
