On Wed, May 22, 2024 at 8:46 PM Euler Taveira <eu...@eulerto.com> wrote: > > On Wed, May 22, 2024, at 8:19 AM, Amit Kapila wrote: > > > v2-0002: not changed > > > > We have added more tries to see if the primary_slot_name becomes > active but I think it is still fragile because it is possible on slow > machines that the required slot didn't become active even after more > retries. I have raised the same comment previously [2] and asked an > additional question but didn't get any response. > > > Following the same line that simplifies the code, we can: (a) add a loop in > check_subscriber() that waits until walreceiver is available on subscriber or > (b) use a timeout. The main advantage of (a) is that the primary slot is > already > available but I'm afraid we need a escape mechanism for the loop (timeout?). >
Sorry, it is not clear to me why we need any additional loop in check_subscriber(), aren't we speaking about the problem in check_publisher() function? Why in the first place do we need to ensure that primary_slot_name is active on the primary? You mentioned something related to WAL retention but I don't know how that is related to this tool's functionality. If at all, we are bothered about WAL retention on the primary that should be the WAL corresponding to consistent_lsn computed by setup_publisher() but this check doesn't seem to ensure that. -- With Regards, Amit Kapila.