Hi,

We had a bit high-level discussion about this patches with Amit off-list, so I decided to also take a look at the actual code.

My main concern originally was the potential for left-over slots on publisher, but I think the state now is relatively okay, with couple of corner cases that are documented and don't seem much worse than the main slot.

I wonder if we should mention the max_slot_wal_keep_size GUC in the table sync docs though.

Another thing that might need documentation is that the the visibility of changes done by table sync is not anymore isolated in that table contents will show intermediate progress to other backends, rather than switching from nothing to state consistent with rest of replication.


Some minor comments about code:

+               else if (res->status == WALRCV_ERROR && missing_ok)
+               {
+                       /* WARNING. Error, but missing_ok = true. */
+                       ereport(WARNING,

I wonder if we need to add error code to the WalRcvExecResult and check for the appropriate ones here. Because this can for example return error because of timeout, not because slot is missing. Not sure if it matters for current callers though (but then maybe don't call the param missign_ok?).


+ReplicationSlotNameForTablesync(Oid suboid, Oid relid, char 
syncslotname[NAMEDATALEN])
+{
+       if (syncslotname)
+               sprintf(syncslotname, "pg_%u_sync_%u", suboid, relid);
+       else
+               syncslotname = psprintf("pg_%u_sync_%u", suboid, relid);
+
+       return syncslotname;
+}

Given that we are now explicitly dropping slots, what happens here if we have 2 different downstreams that happen to get same suboid and reloid, will one of the drop the slot of the other one? Previously with the cleanup being left to temp slot we'd at maximum got error when creating it but with the new logic in LogicalRepSyncTableStart it feels like we could get into situation where 2 downstreams are fighting over slot no?


--
Petr


Reply via email to