Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

Önder Kalacı Wed, 08 Mar 2023 03:12:16 -0800

Hi Peter, all


>
> 1.
> Saying the index "should" or "should not" do this or that sounds like
> it is still OK but just not recommended. TO remove this ambigity IMO
> most of the "should" ought to be changed to "must" because IIUC this
> patch will simply not consider indexes which do not obey all your
> rules.
>
> This comment applies to a few places (search for "should")
>
> e.g.1. - Commit Message
> e.g.2. - /* There should always be at least one attribute for the index
> scan. */
> e.g.3. - The function comment for
> FindUsableIndexForReplicaIdentityFull(Relation localrel)
>
> ======
> doc/src/sgml/logical-replication.sgml
>

I'm definitely not an expert on this subject (or native speaker). So, I'm
following your
suggestion.


>
> 2.
> A published table must have a “replica identity” configured in order
> to be able to replicate UPDATE and DELETE operations, so that
> appropriate rows to update or delete can be identified on the
> subscriber side. By default, this is the primary key, if there is one.
> Another unique index (with certain additional requirements) can also
> be set to be the replica identity. If the table does not have any
> suitable key, then it can be set to replica identity “full”, which
> means the entire row becomes the key. When replica identity “full” is
> specified, indexes can be used on the subscriber side for searching
> the rows. Candidate indexes must be btree, non-partial, and have at
> least one column reference (i.e. cannot consist of only expressions).
> These restrictions on the non-unique index properties adheres some of
> the restrictions that are enforced for primary keys. Internally, we
> follow a similar approach for supporting index scans within logical
> replication scope. If there are no such suitable indexes, the search
> on the subscriber side can be very inefficient, therefore replica
> identity “full” should only be used as a fallback if no other solution
> is possible. If a replica identity other than “full” is set on the
> publisher side, a replica identity comprising the same or fewer
> columns must also be set on the subscriber side. See REPLICA IDENTITY
> for details on how to set the replica identity. If a table without a
> replica identity is added to a publication that replicates UPDATE or
> DELETE operations then subsequent UPDATE or DELETE operations will
> cause an error on the publisher. INSERT operations can proceed
> regardless of any replica identity.
>
> ~
>
> "adheres some of" --> "adhere to some of" ?
>

sounds right, changed


>
> ======
> src/backend/executor/execReplication.c
>
> 3. build_replindex_scan_key
>
>   {
>   Oid operator;
>   Oid opfamily;
>   RegProcedure regop;
> - int pkattno = attoff + 1;
> - int mainattno = indkey->values[attoff];
> - Oid optype = get_opclass_input_type(opclass->values[attoff]);
> + int table_attno = indkey->values[index_attoff];
> + Oid optype = get_opclass_input_type(opclass->values[index_attoff]);
>
> These variable declarations might look tidier if you kept all the Oids
> together.
>
> ======
> src/backend/replication/logical/relation.c
>

Based on the discussions below, I kept as-is. I really don't want to do
unrelated
changes in this patch, as I also got several feedback for not doing it,



>
> 4. logicalrep_partition_open
>
> + /*
> + * Finding a usable index is an infrequent task. It occurs when an
> + * operation is first performed on the relation, or after invalidation of
> + * the relation cache entry (such as ANALYZE or CREATE/DROP index on the
> + * relation).
> + *
> + * We also prefer to run this code on the oldctx such that we do not
> + * leak anything in the LogicalRepPartMapContext (hence
> + * CacheMemoryContext).
> + */
> + entry->localindexoid = FindLogicalRepLocalIndex(partrel, remoterel)
>
> "such that" --> "so that" ?
>
>
fixed


> ~~~
>
> 5. IsIndexUsableForReplicaIdentityFull
>
> +bool
> +IsIndexUsableForReplicaIdentityFull(IndexInfo *indexInfo)
> +{
> + bool is_btree = (indexInfo->ii_Am == BTREE_AM_OID);
> + bool is_partial = (indexInfo->ii_Predicate != NIL);
> + bool is_only_on_expression = IsIndexOnlyOnExpression(indexInfo);
> +
> + if (is_btree && !is_partial && !is_only_on_expression)
> + {
> + return true;
> + }
> +
> + return false;
> +}
>
> SUGGESTION (no need for 2 returns here)
> return is_btree && !is_partial && !is_only_on_expression;
>

true, done


>
> ======
> src/backend/replication/logical/worker.c
>
> 6. FindReplTupleInLocalRel
>
> static bool
> FindReplTupleInLocalRel(EState *estate, Relation localrel,
> LogicalRepRelation *remoterel,
> Oid localidxoid,
> TupleTableSlot *remoteslot,
> TupleTableSlot **localslot)
> {
> bool found;
>
> /*
> * Regardless of the top-level operation, we're performing a read here, so
> * check for SELECT privileges.
> */
> TargetPrivilegesCheck(localrel, ACL_SELECT);
>
> *localslot = table_slot_create(localrel, &estate->es_tupleTable);
>
> Assert(OidIsValid(localidxoid) ||
>    (remoterel->replident == REPLICA_IDENTITY_FULL));
>
> if (OidIsValid(localidxoid))
> found = RelationFindReplTupleByIndex(localrel, localidxoid,
> LockTupleExclusive,
> remoteslot, *localslot);
> else
> found = RelationFindReplTupleSeq(localrel, LockTupleExclusive,
> remoteslot, *localslot);
>
> return found;
> }
>
> ~
>
> Since that 'found' variable is not used, you might as well remove the
> if/else and simplify the code.
>
> SUGGESTION
> static bool
> FindReplTupleInLocalRel(EState *estate, Relation localrel,
> LogicalRepRelation *remoterel,
> Oid localidxoid,
> TupleTableSlot *remoteslot,
> TupleTableSlot **localslot)
> {
> /*
> * Regardless of the top-level operation, we're performing a read here, so
> * check for SELECT privileges.
> */
> TargetPrivilegesCheck(localrel, ACL_SELECT);
>
> *localslot = table_slot_create(localrel, &estate->es_tupleTable);
>
> Assert(OidIsValid(localidxoid) ||
>    (remoterel->replident == REPLICA_IDENTITY_FULL));
>
> if (OidIsValid(localidxoid))
> return RelationFindReplTupleByIndex(localrel, localidxoid,
> LockTupleExclusive,
> remoteslot, *localslot);
>
> return RelationFindReplTupleSeq(localrel, LockTupleExclusive,
> remoteslot, *localslot);
>
>
Maybe you are right, we don't need the variable. But I don't want to get
into
further discussions just because I'd change unrelated code to the patch.

So, I think I prefer to skip this change unless you have strong objections.


> ~~~
>
> 7. apply_handle_tuple_routing
>
> @@ -2890,6 +2877,7 @@ apply_handle_tuple_routing(ApplyExecutionData *edata,
>   TupleConversionMap *map;
>   MemoryContext oldctx;
>   LogicalRepRelMapEntry *part_entry = NULL;
> +
>   AttrMap    *attrmap = NULL;
>
>   /* ModifyTableState is needed for ExecFindPartition(). */
> The added whitespace seems unrelated to this patch.
>

Thanks, fixed


>
>
> ======
> src/include/replication/logicalrelation.h
>
> 8.
> @@ -31,6 +32,7 @@ typedef struct LogicalRepRelMapEntry
>   Relation localrel; /* relcache entry (NULL when closed) */
>   AttrMap    *attrmap; /* map of local attributes to remote ones */
>   bool updatable; /* Can apply updates/deletes? */
> + Oid localindexoid; /* which index to use, or InvalidOid if none */
>
> Indentation is not correct for that new member comment.
>

fixed, thanks


I'm attaching v36.


Thanks,
Onder

v36_0002_use_index_on_subs_when_pub_rep_ident_full.patch
Description: Binary data

v36_0001_use_index_on_subs_when_pub_rep_ident_full.patch
Description: Binary data

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

Reply via email to