On Tuesday, March 7, 2023 9:47 PM Önder Kalacı <onderkal...@gmail.com> wrote:
Hi, > > > Let me give an example to demonstrate why I thought something is fishy > > > here: > > > > > > Imagine rel has a (non-default) REPLICA IDENTITY with Oid=1111. > > > Imagine the same rel has a PRIMARY KEY with Oid=2222. > > > > > Hmm, alright, this is syntactically possible, but not sure if any user > would do this. Still thanks for catching this. > > And, you are right, if a user has created such a schema, > IdxIsRelationIdentityOrPK() > would return the wrong result and we'd use sequential scan instead of index > scan. > This would be a regression. I think we should change the function. I am looking at the latest patch and have a question about the following code. /* Try to find the tuple */ - if (index_getnext_slot(scan, ForwardScanDirection, outslot)) + while (index_getnext_slot(scan, ForwardScanDirection, outslot)) { - found = true; + /* + * Avoid expensive equality check if the index is primary key or + * replica identity index. + */ + if (!idxIsRelationIdentityOrPK) + { + if (eq == NULL) + { +#ifdef USE_ASSERT_CHECKING + /* apply assertions only once for the input idxoid */ + IndexInfo *indexInfo = BuildIndexInfo(idxrel); + Assert(IsIndexUsableForReplicaIdentityFull(indexInfo)); +#endif + + /* + * We only need to allocate once. This is allocated within per + * tuple context -- ApplyMessageContext -- hence no need to + * explicitly pfree(). + */ + eq = palloc0(sizeof(*eq) * outslot->tts_tupleDescriptor->natts); + } + + if (!tuples_equal(outslot, searchslot, eq)) + continue; + } IIRC, it invokes tuples_equal for all cases unless we are using replica identity key or primary key to scan. But there seem some other cases where the tuples_equal looks unnecessary. For example, if the table on subscriber don't have a PK or RI key but have a not-null, non-deferrable, unique key. And if the apply worker choose this index to do the scan, it seems we can skip the tuples_equal as well. --Example pub: CREATE TABLE test_replica_id_full (a int, b int not null); ALTER TABLE test_replica_id_full REPLICA IDENTITY FULL; CREATE PUBLICATION tap_pub_rep_full FOR TABLE test_replica_id_full; sub: CREATE TABLE test_replica_id_full (a int, b int not null); CREATE UNIQUE INDEX test_replica_id_full_idx ON test_replica_id_full(b); -- I am not 100% sure if it's worth optimizing this by complicating the check in idxIsRelationIdentityOrPK. What do you think ? Best Regards, Hou zj