Re: Synchronizing slots from primary to standby

Amit Kapila Sat, 20 Jan 2024 02:44:42 -0800

On Sat, Jan 20, 2024 at 10:52 AM Dilip Kumar <dilipbal...@gmail.com> wrote:
>
> On Fri, Jan 19, 2024 at 5:24 PM Amit Kapila <amit.kapil...@gmail.com> wrote:
> >
> > On Wed, Jan 17, 2024 at 4:00 PM shveta malik <shveta.ma...@gmail.com> wrote:
> > >
> >
> > I had some off-list discussions with Sawada-San, Hou-San, and Shveta
> > on the topic of extending replication commands instead of using the
> > current model where we fetch the required slot information via SQL
> > using a database connection. I would like to summarize the discussion
> > and would like to know the thoughts of others on this topic.
> >
> > In the current patch, we launch the slotsync worker on physical
> > standby which connects to the specified database (currently we let
> > users specify the required dbname in primary_conninfo) on the primary.
> > It then fetches the required information for failover marked slots
> > from the primary and also does some primitive checks on the upstream
> > node via SQL (the additional checks are like whether the upstream node
> > has a specified physical slot or whether the upstream node is a
> > primary node or a standby node). To fetch the required information it
> > uses a libpqwalreciever API which is mostly apt for this purpose as it
> > supports SQL execution but for this patch, we don't need a replication
> > connection, so we extend the libpqwalreciever connect API.
>
> What sort of extension we have done to 'libpqwalreciever'? Is it
> something like by default this supports replication connections so we
> have done an extension to the API so that we can provide an option
> whether to create a replication connection or a normal connection?
>


Yeah and in the future there could be more as well. The other function
added walrcv_get_dbname_from_conninfo doesn't appear to be a problem
either for now.

> > Now, the concerns related to this could be that users would probably
> > need to change existing mechanisms/tools to update priamry_conninfo
> > and one of the alternatives proposed is to have an additional GUC like
> > slot_sync_dbname. Users won't be able to drop the database this worker
> > is connected to aka whatever is specified in slot_sync_dbname but as
> > the user herself sets up the configuration it shouldn't be a big deal.
>
> Yeah for this purpose users may use template1 or so which they
> generally don't plan to drop.
>

Using template1 has other problems like users won't be able to create
a new database. See [2] (point number 2.2)

>
>  So in case the user wants to drop that
> database user needs to turn off the slot syncing option and then it
> can be done?
>

Right.

> > Then we also discussed whether extending libpqwalreceiver's connect
> > API is a good idea and whether we need to further extend it in the
> > future. As far as I can see, slotsync worker's primary requirement is
> > to execute SQL queries which the current API is sufficient, and don't
> > see something that needs any drastic change in this API. Note that
> > tablesync worker that executes SQL also uses these APIs, so we may
> > need something in the future for either of those. Then finally we need
> > a slotsync worker to also connect to a database to use SQL and fetch
> > results.
>
> While looking into the patch v64-0002 I could not exactly point out
> what sort of extensions are there in libpqwalreceiver.c, I just saw
> one extra API for fetching the dbname from connection info?
>

Right, the worry was that we may need it in the future.

> > Now, let us consider if we extend the replication commands like
> > READ_REPLICATION_SLOT and or introduce a new set of replication
> > commands to fetch the required information then we don't need a DB
> > connection with primary or a connection in slotsync worker. As per my
> > current understanding, it is quite doable but I think we will slowly
> > go in the direction of making replication commands something like SQL
> > because today we need to extend it to fetch all slots info that have
> > failover marked as true, the existence of a particular replication,
> > etc. Then tomorrow, if we want to extend this work to have multiple
> > slotsync workers say workers perdb then we have to extend the
> > replication command to fetch per-database failover marked slots. To
> > me, it sounds more like we are slowly adding SQL-like features to
> > replication commands.
> >
> > Apart from this when we are reading per-db replication slots without
> > connecting to a database, we probably need some additional protection
> > mechanism so that the database won't get dropped.
>
> Something like locking the database only while fetching the slots?
>

Possible, but can we lock the database from an auxiliary process?

> > Considering all this it seems that for now probably extending
> > replication commands can simplify a few things like mentioned above
> > but using SQL's with db-connection is more extendable.
>
> Even I have similar thoughts.
>

Thanks.

[1] - 
https://www.postgresql.org/message-id/CAJpy0uBhPx1MDHh903XpFAhpBH23KzVXyg_4VjH2zXk81oGi1w%40mail.gmail.com

-- 
With Regards,
Amit Kapila.

Re: Synchronizing slots from primary to standby

Reply via email to