On Tue, Nov 21, 2023 at 10:02 AM Zhijie Hou (Fujitsu) <houzj.f...@fujitsu.com> wrote: > > On Friday, November 17, 2023 7:39 PM Amit Kapila <amit.kapil...@gmail.com> > wrote: > > > > On Thu, Nov 16, 2023 at 5:34 PM shveta malik <shveta.ma...@gmail.com> > > wrote: > > > > > > PFA v35. > > > > > > > Review v35-0002* > > ============== > > Thanks for the comments. > > > 1. > > As quoted in the commit message, > > > > > If a logical slot is invalidated on the primary, slot on the standby is also > > invalidated. If a logical slot on the primary is valid but is invalidated > > on the > > standby due to conflict (say required rows removed on the primary), then > > that > > slot is dropped and recreated on the standby in next sync-cycle. > > It is okay to recreate such slots as long as these are not consumable on the > > standby (which is the case currently). > > > > > > > I think this won't happen normally because of the physical slot and > > hot_standby_feedback but probably can occur in cases like if the user > > temporarily switches hot_standby_feedback from on to off. Are there any > > other > > reasons? I think we can mention the cases along with it as well at least > > for now. > > Additionally, I think this should be covered in code comments as well. > > I will collect all these cases and update in next version. > > > > > 2. > > #include "postgres.h" > > - > > +#include "access/genam.h" > > > > Spurious line removal. > > Removed. > > > > > 3. > > A password needs to be provided too, if the sender demands > > password > > authentication. It can be provided in the > > <varname>primary_conninfo</varname> string, or in a separate > > - <filename>~/.pgpass</filename> file on the standby server (use > > - <literal>replication</literal> as the database name). > > - Do not specify a database name in the > > - <varname>primary_conninfo</varname> string. > > + <filename>~/.pgpass</filename> file on the standby server. > > + </para> > > + <para> > > + Specify <literal>dbname</literal> in > > + <varname>primary_conninfo</varname> string to allow > > synchronization > > + of slots from the primary server to the standby server. > > + This will only be used for slot synchronization. It is ignored > > + for streaming. > > > > Is there a reason to remove part of the earlier sentence "use > > <literal>replication</literal> as the database name"? > > Added it back. > > > > > 4. > > + <primary><varname>enable_syncslot</varname> configuration > > parameter</primary> > > + </indexterm> > > + </term> > > + <listitem> > > + <para> > > + It enables a physical standby to synchronize logical failover slots > > + from the primary server so that logical subscribers are not blocked > > + after failover. > > + </para> > > + <para> > > + It is enabled by default. This parameter can only be set in the > > + <filename>postgresql.conf</filename> file or on the server > > command line. > > + </para> > > > > I think you forgot to update the documentation for the default value of this > > variable. > > Updated. > > > > > 5. > > + * a) start the logical replication workers for every enabled > > subscription > > + * when not in standby_mode > > + * b) start the slot-sync worker for logical failover slots > > synchronization > > + * from the primary server when in standby_mode. > > > > Either use a full stop after both lines or none of these. > > Added a full stop. > > > > > 6. > > +static void slotsync_worker_cleanup(SlotSyncWorkerInfo * worker); > > > > There shouldn't be space between * and the worker. > > Removed, and added the type to typedefs.list. > > > > > 7. > > + if (!SlotSyncWorker->hdr.in_use) > > + { > > + LWLockRelease(SlotSyncWorkerLock); > > + ereport(ERROR, > > + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > > + errmsg("replication slot-sync worker not initialized, " > > + "cannot attach"))); > > + } > > + > > + if (SlotSyncWorker->hdr.proc) > > + { > > + LWLockRelease(SlotSyncWorkerLock); > > + ereport(ERROR, > > + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), > > + errmsg("replication slot-sync worker is " > > + "already running, cannot attach"))); > > + } > > > > Using slot-sync in the error messages looks a bit odd to me. Can we use > > "replication slot sync worker ..." in both these and other similar > > messages? I > > think it would be better if we don't split the messages into multiple lines > > in > > these cases as messages don't appear too long to me. > > Changed as suggested. > > > > > 8. > > +/* > > + * Detach the worker from DSM and update 'proc' and 'in_use'. > > + * Logical replication launcher will come to know using these > > + * that the worker has shutdown. > > + */ > > +void > > +slotsync_worker_detach(int code, Datum arg) { > > > > I think the reference to DSM is leftover from the previous version of the > > patch. > > Can we change the above comments as per the new code? > > Changed. > > > > > 9. > > +static bool > > +slotsync_worker_launch() > > { > > ... > > + /* TODO: do we really need 'generation', analyse more here */ > > + worker->hdr.generation++; > > > > We should do something about this TODO. As per my understanding, we don't > > need a generation number for the slot sync worker as we have one such worker > > but I guess the patch requires it because we are using existing logical > > replication worker infrastructure.
Yes, we do not need generation, but since we want to use existing logical-rep worker infrastructure, we can retain generation but can keep it as zero always for the slot-sync worker case. > > This brings the question of whether we really > > need a separate SlotSyncWorkerInfo or if we can use existing > > LogicalRepWorker and distinguish it with LogicalRepWorkerType? I guess you > > didn't use it because most of the fields in LogicalRepWorker will be unused > > for > > slot sync worker. Yes, right. If we use LogicalRepWorker in the slot-sync worker, then it will be a task to keep a check (even in future) that no-one should end up using uninitialized fields in slot-sync code. That is why shifting common fields to LogicalWorkerHeader and using that in SlotSyncWorkerInfo and LogicalRepWorker seems a better approach to me. > > Will think about this one and update in next version. > > > > > 10. > > + * Can't use existing functions like 'get_database_oid' from > > +dbcommands.c for > > + * validity purpose as they need db connection. > > + */ > > +static bool > > +validate_dbname(const char *dbname) > > > > I don't know how important it is to validate the dbname before launching the > > sync slot worker because anyway after launching, it will give an error while > > initializing the connection if the dbname is invalid. But, if we think it > > is really > > required, did you consider using GetDatabaseTuple()? > > Yes, we could export GetDatabaseTuple. Apart from this, I am thinking is it > possible to > release the restriction for the dbname. For example, slot sync worker could > always connect to the 'template1' as the worker doesn't update the > database objects. Although I didn't find some examples on server side, but > some > client commands(e.g. pg_upgrade) will connect to template1 to check some > global > objects. (Just FYI, the previous version patch used a replication command > which > may avoid the dbname but was replaced with SELECT to improve the flexibility > and > avoid introducing new command.) > > Best Regards, > Hou zj > >