On Mon, Jul 1, 2024 at 8:22 PM Hayato Kuroda (Fujitsu) <kuroda.hay...@fujitsu.com> wrote: > > > I have a different but possibly-related complaint: why is > > 040_pg_createsubscriber.pl so miserably slow? On my machine it > > runs for a bit over 19 seconds, which seems completely out of line > > (for comparison, 010_pg_basebackup.pl takes 6 seconds, and the > > other test scripts in this directory take much less). It looks > > like most of the blame falls on this step: > > > > [12:47:22.292](14.534s) ok 28 - run pg_createsubscriber on node S > > > > AFAICS the amount of data being replicated is completely trivial, > > so that it doesn't make any sense for this to take so long --- and > > if it does, that suggests that this tool will be impossibly slow > > for production use. But I suspect there is a logic flaw causing > > this. > > I analyzed the issue. My elog() debugging said that wait_for_end_recovery() > was > wasted some time. This was caused by the recovery target seeming > unsatisfactory. > > We are setting recovery_target_lsn by the return value of > pg_create_logical_replication_slot(), > which returns the end of the RUNNING_XACT record. If we use the returned > value as > recovery_target_lsn as-is, however, we must wait for additional WAL generation > because the parameter requires that the replicated WAL overtake a certain > point. > On my env, the function waited until the bgwriter emitted the > XLOG_RUNNING_XACTS record. >
IIUC, the problem is that the consistent_lsn value returned by setup_publisher() is the "end +1" location of the required LSN whereas the recovery_target_lsn used in wait_for_end_recovery() expects the LSN value to be "start" location of required LSN. > One simple solution is to add an additional WAL record at the end of the > publisher > setup. IIUC, an arbitrary WAL insertion can reduce the waiting time. The > attached > patch inserts a small XLOG_LOGICAL_MESSAGE record, which could reduce much > execution > time on my environment. > This sounds like an ugly hack to me and don't know if we can use it. The ideal way to fix this is to get the start_lsn from the create_logical_slot functionality or have some parameter like recover_target_end_lsn but I don't know if this is a good time to extend such a functionality. -- With Regards, Amit Kapila.