On Mon, Jul 24, 2023 at 8:03 AM Bharath Rupireddy <bharath.rupireddyforpostg...@gmail.com> wrote: > > On Fri, Jul 21, 2023 at 5:16 PM shveta malik <shveta.ma...@gmail.com> wrote: > > > > Thanks Bharat for letting us know. It is okay to split the patch, it > > may definitely help to understand the modules better but shall we take > > a step back and try to reevaluate the design first before moving to > > other tasks? > > Agree that design comes first. FWIW, I'm attaching the v9 patch set > that I have with me. It can't be a perfect patch set unless the design > is finalized. > > > I analyzed more on the issues stated in [1] for replacing LIST_SLOTS > > with SELECT query. On rethinking, it might not be a good idea to > > replace this cmd with SELECT in Launcher code-path > > I think there are open fundamental design aspects, before optimizing > LIST_SLOTS, see below. I'm sure we can come back to this later. > > > Secondly, I was thinking if the design proposed in the patch is the > > best one. No doubt, it is the most simplistic design and thus may > > .......... Any feedback is appreciated. > > Here are my thoughts about this feature: > > Current design: > > 1. On primary, never allow walsenders associated with logical > replication slots to go ahead of physical standbys that are candidates > for future primary after failover. This enables subscribers to connect > to new primary after failover. > 2. On all candidate standbys, periodically sync logical slots from > primary (creating the slots if necessary) with one slot sync worker > per logical slot. > > Important considerations: > > 1. Does this design guarantee the row versions required by subscribers > aren't removed on candidate standbys as raised here - > https://www.postgresql.org/message-id/20220218222319.yozkbhren7vkjbi5%40alap3.anarazel.de? > > It seems safe with logical decoding on standbys feature. Also, a > test-case from upthread is already in patch sets (in v9 too) > https://www.postgresql.org/message-id/CAAaqYe9FdKODa1a9n%3Dqj%2Bw3NiB9gkwvhRHhcJNginuYYRCnLrg%40mail.gmail.com. > However, we need to verify the use cases extensively. >
Agreed. > 2. All candidate standbys will start one slot sync worker per logical > slot which might not be scalable. > Yeah, that doesn't sound like a good idea but IIRC, the proposed patch is using one worker per database (for all slots corresponding to a database). > Is having one (or a few more - not > necessarily one for each logical slot) worker for all logical slots > enough? > I guess for a large number of slots the is a possibility of a large gap in syncing the slots which probably means we need to retain corresponding WAL for a much longer time on the primary. If we can prove that the gap won't be large enough to matter then this would be probably worth considering otherwise, I think we should find a way to scale the number of workers to avoid the large gap. -- With Regards, Amit Kapila.