On Thu, Jul 28, 2022 at 9:32 PM Melih Mutlu <m.melihmu...@gmail.com> wrote:
>>
>> Why after step 4, do you need to drop the replication slot? Won't just
>> clearing the required info from the catalog be sufficient?
>
>
> The replication slots that we read from the catalog will not be used for 
> anything else after we're done with syncing the table which the rep slot 
> belongs to.
> It's removed from the catalog when the sync is completed and it basically 
> becomes a slot that is not linked to any table or worker. That's why I think 
> it should be dropped rather than left behind.
>
> Note that if a worker dies and its replication slot continues to exist, that 
> slot will only be used to complete the sync process of the one table that the 
> dead worker was syncing but couldn't finish.
> When that particular table is synced and becomes ready, the replication slot 
> has no use anymore.
>

Why can't it be used to sync the other tables if any?

>>
>> Hmm, I think even if there is an iota of a chance which I think is
>> there, we can't use worker_pid. Assume, that if the same worker_pid is
>> assigned to another worker once the worker using it got an error out,
>> the new worker will fail as soon as it will try to create a
>> replication slot.
>
>
> Right. If something like that happens, worker will fail without doing 
> anything. Then a new one will be launched and that one will continue to do 
> the work.
> The worst case might be having conflicting pid over and over again while also 
> having replication slots whose name includes one of those pids still exist.
> It seems unlikely but possible, yes.
>
>>
>> I feel it would be better or maybe we need to think of some other
>> identifier but one thing we need to think about before using a 64-bit
>> unique identifier here is how will we retrieve its last used value
>> after restart of server. We may need to store it in a persistent way
>> somewhere.
>
>
> We might consider storing this info in a catalog again. Since this last used 
> value will be different for each subscription, pg_subscription can be a good 
> place to keep that.
>

This sounds reasonable. Let's do this unless we get some better idea.

>>
>> The problems will be similar to the slot name. The origin is used to
>> track the progress of replication, so, if we use the wrong origin name
>> after the restart, it can send the wrong start_streaming position to
>> the publisher.
>
>
> I understand. But origin naming logic is still the same. Its format is like 
> pg_<subid>_<relid> .
> I did not need to change this since it seems to me origins should belong to 
> only one table. The patch does not reuse origins.
> So I don't think this change introduces an issue with origin. What do you 
> think?
>

There is no such restriction that origins should belong to only one
table. What makes you think like that?

-- 
With Regards,
Amit Kapila.


Reply via email to