Re: Synchronizing slots from primary to standby

shveta malik Tue, 21 Nov 2023 00:27:16 -0800

On Tue, Nov 21, 2023 at 10:02 AM Zhijie Hou (Fujitsu)
<houzj.f...@fujitsu.com> wrote:
>
> On Friday, November 17, 2023 7:39 PM Amit Kapila <amit.kapil...@gmail.com> 
> wrote:
> >
> > On Thu, Nov 16, 2023 at 5:34 PM shveta malik <shveta.ma...@gmail.com>
> > wrote:
> > >
> > > PFA v35.
> > >
> >
> > Review v35-0002*
> > ==============
>
> Thanks for the comments.
>
> > 1.
> > As quoted in the commit message,
> > >
> > If a logical slot is invalidated on the primary, slot on the standby is also
> > invalidated. If a logical slot on the primary is valid but is invalidated 
> > on the
> > standby due to conflict (say required rows removed on the primary), then 
> > that
> > slot is dropped and recreated on the standby in next sync-cycle.
> > It is okay to recreate such slots as long as these are not consumable on the
> > standby (which is the case currently).
> > >
> >
> > I think this won't happen normally because of the physical slot and
> > hot_standby_feedback but probably can occur in cases like if the user
> > temporarily switches hot_standby_feedback from on to off. Are there any 
> > other
> > reasons? I think we can mention the cases along with it as well at least 
> > for now.
> > Additionally, I think this should be covered in code comments as well.
>
> I will collect all these cases and update in next version.
>
> >
> > 2.
> >  #include "postgres.h"
> > -
> > +#include "access/genam.h"
> >
> > Spurious line removal.
>
> Removed.
>
> >
> > 3.
> >            A password needs to be provided too, if the sender demands
> > password
> >            authentication.  It can be provided in the
> >            <varname>primary_conninfo</varname> string, or in a separate
> > -          <filename>~/.pgpass</filename> file on the standby server (use
> > -          <literal>replication</literal> as the database name).
> > -          Do not specify a database name in the
> > -          <varname>primary_conninfo</varname> string.
> > +          <filename>~/.pgpass</filename> file on the standby server.
> > +         </para>
> > +         <para>
> > +          Specify <literal>dbname</literal> in
> > +          <varname>primary_conninfo</varname> string to allow
> > synchronization
> > +          of slots from the primary server to the standby server.
> > +          This will only be used for slot synchronization. It is ignored
> > +          for streaming.
> >
> > Is there a reason to remove part of the earlier sentence "use
> > <literal>replication</literal> as the database name"?
>
> Added it back.
>
> >
> > 4.
> > +       <primary><varname>enable_syncslot</varname> configuration
> > parameter</primary>
> > +      </indexterm>
> > +      </term>
> > +      <listitem>
> > +       <para>
> > +        It enables a physical standby to synchronize logical failover slots
> > +        from the primary server so that logical subscribers are not blocked
> > +        after failover.
> > +       </para>
> > +       <para>
> > +        It is enabled by default. This parameter can only be set in the
> > +        <filename>postgresql.conf</filename> file or on the server
> > command line.
> > +       </para>
> >
> > I think you forgot to update the documentation for the default value of this
> > variable.
>
> Updated.
>
> >
> > 5.
> > + *   a) start the logical replication workers for every enabled 
> > subscription
> > + *      when not in standby_mode
> > + *   b) start the slot-sync worker for logical failover slots 
> > synchronization
> > + *      from the primary server when in standby_mode.
> >
> > Either use a full stop after both lines or none of these.
>
> Added a full stop.
>
> >
> > 6.
> > +static void slotsync_worker_cleanup(SlotSyncWorkerInfo * worker);
> >
> > There shouldn't be space between * and the worker.
>
> Removed, and added the type to typedefs.list.
>
> >
> > 7.
> > + if (!SlotSyncWorker->hdr.in_use)
> > + {
> > + LWLockRelease(SlotSyncWorkerLock);
> > + ereport(ERROR,
> > + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> > + errmsg("replication slot-sync worker not initialized, "
> > + "cannot attach")));
> > + }
> > +
> > + if (SlotSyncWorker->hdr.proc)
> > + {
> > + LWLockRelease(SlotSyncWorkerLock);
> > + ereport(ERROR,
> > + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
> > + errmsg("replication slot-sync worker is "
> > + "already running, cannot attach")));
> > + }
> >
> > Using slot-sync in the error messages looks a bit odd to me. Can we use
> > "replication slot sync worker ..." in both these and other similar 
> > messages? I
> > think it would be better if we don't split the messages into multiple lines 
> > in
> > these cases as messages don't appear too long to me.
>
> Changed as suggested.
>
> >
> > 8.
> > +/*
> > + * Detach the worker from DSM and update 'proc' and 'in_use'.
> > + * Logical replication launcher will come to know using these
> > + * that the worker has shutdown.
> > + */
> > +void
> > +slotsync_worker_detach(int code, Datum arg) {
> >
> > I think the reference to DSM is leftover from the previous version of the 
> > patch.
> > Can we change the above comments as per the new code?
>
> Changed.
>
> >
> > 9.
> > +static bool
> > +slotsync_worker_launch()
> > {
> > ...
> > + /* TODO: do we really need 'generation', analyse more here */
> > + worker->hdr.generation++;
> >
> > We should do something about this TODO. As per my understanding, we don't
> > need a generation number for the slot sync worker as we have one such worker
> > but I guess the patch requires it because we are using existing logical
> > replication worker infrastructure.


Yes, we do not need generation, but since we want to use  existing
logical-rep worker infrastructure, we can retain generation but can
keep it as zero always for the slot-sync worker case.

> > This brings the question of whether we really
> > need a separate SlotSyncWorkerInfo or if we can use existing
> > LogicalRepWorker and distinguish it with LogicalRepWorkerType? I guess you
> > didn't use it because most of the fields in LogicalRepWorker will be unused 
> > for
> > slot sync worker.

Yes, right. If we use LogicalRepWorker in the slot-sync worker, then
it will be a task to keep a check (even in future) that no-one should
end up using uninitialized fields in slot-sync code. That is why
shifting common fields to LogicalWorkerHeader and using that in
SlotSyncWorkerInfo and LogicalRepWorker seems a better approach to me.

>
> Will think about this one and update in next version.
>
> >
> > 10.
> > + * Can't use existing functions like 'get_database_oid' from
> > +dbcommands.c for
> > + * validity purpose as they need db connection.
> > + */
> > +static bool
> > +validate_dbname(const char *dbname)
> >
> > I don't know how important it is to validate the dbname before launching the
> > sync slot worker because anyway after launching, it will give an error while
> > initializing the connection if the dbname is invalid. But, if we think it 
> > is really
> > required, did you consider using GetDatabaseTuple()?
>
> Yes, we could export GetDatabaseTuple. Apart from this, I am thinking is it 
> possible to
> release the restriction for the dbname. For example, slot sync worker could
> always connect to the 'template1' as the worker doesn't update the
> database objects. Although I didn't find some examples on server side, but 
> some
> client commands(e.g. pg_upgrade) will connect to template1 to check some 
> global
> objects. (Just FYI, the previous version patch used a replication command 
> which
> may avoid the dbname but was replaced with SELECT to improve the flexibility 
> and
> avoid introducing new command.)
>
> Best Regards,
> Hou zj
>
>

Re: Synchronizing slots from primary to standby

Reply via email to