On Mon, Jul 1, 2024 at 8:22 PM Hayato Kuroda (Fujitsu)
<kuroda.hay...@fujitsu.com> wrote:
>
> > I have a different but possibly-related complaint: why is
> > 040_pg_createsubscriber.pl so miserably slow?  On my machine it
> > runs for a bit over 19 seconds, which seems completely out of line
> > (for comparison, 010_pg_basebackup.pl takes 6 seconds, and the
> > other test scripts in this directory take much less).  It looks
> > like most of the blame falls on this step:
> >
> > [12:47:22.292](14.534s) ok 28 - run pg_createsubscriber on node S
> >
> > AFAICS the amount of data being replicated is completely trivial,
> > so that it doesn't make any sense for this to take so long --- and
> > if it does, that suggests that this tool will be impossibly slow
> > for production use.  But I suspect there is a logic flaw causing
> > this.
>
> I analyzed the issue. My elog() debugging said that wait_for_end_recovery() 
> was
> wasted some time. This was caused by the recovery target seeming 
> unsatisfactory.
>
> We are setting recovery_target_lsn by the return value of 
> pg_create_logical_replication_slot(),
> which returns the end of the RUNNING_XACT record. If we use the returned 
> value as
> recovery_target_lsn as-is, however, we must wait for additional WAL generation
> because the parameter requires that the replicated WAL overtake a certain 
> point.
> On my env, the function waited until the bgwriter emitted the 
> XLOG_RUNNING_XACTS record.
>

IIUC, the problem is that the consistent_lsn value returned by
setup_publisher() is the "end +1" location of the required LSN whereas
the recovery_target_lsn used in wait_for_end_recovery() expects the
LSN value to be "start" location of required LSN.

> One simple solution is to add an additional WAL record at the end of the 
> publisher
> setup. IIUC, an arbitrary WAL insertion can reduce the waiting time. The 
> attached
> patch inserts a small XLOG_LOGICAL_MESSAGE record, which could reduce much 
> execution
> time on my environment.
>

This sounds like an ugly hack to me and don't know if we can use it.
The ideal way to fix this is to get the start_lsn from the
create_logical_slot functionality or have some parameter like
recover_target_end_lsn but I don't know if this is a good time to
extend such a functionality.

-- 
With Regards,
Amit Kapila.


Reply via email to