On Fri, Jan 19, 2024 at 5:23 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Wed, Jan 17, 2024 at 4:00 PM shveta malik <shveta.ma...@gmail.com> wrote: > > > > I had some off-list discussions with Sawada-San, Hou-San, and Shveta > on the topic of extending replication commands instead of using the > current model where we fetch the required slot information via SQL > using a database connection. I would like to summarize the discussion > and would like to know the thoughts of others on this topic. > > In the current patch, we launch the slotsync worker on physical > standby which connects to the specified database (currently we let > users specify the required dbname in primary_conninfo) on the primary. > It then fetches the required information for failover marked slots > from the primary and also does some primitive checks on the upstream > node via SQL (the additional checks are like whether the upstream node > has a specified physical slot or whether the upstream node is a > primary node or a standby node). To fetch the required information it > uses a libpqwalreciever API which is mostly apt for this purpose as it > supports SQL execution but for this patch, we don't need a replication > connection, so we extend the libpqwalreciever connect API. > > Now, the concerns related to this could be that users would probably > need to change existing mechanisms/tools to update priamry_conninfo > and one of the alternatives proposed is to have an additional GUC like > slot_sync_dbname. Users won't be able to drop the database this worker > is connected to aka whatever is specified in slot_sync_dbname but as > the user herself sets up the configuration it shouldn't be a big deal. > Then we also discussed whether extending libpqwalreceiver's connect > API is a good idea and whether we need to further extend it in the > future. As far as I can see, slotsync worker's primary requirement is > to execute SQL queries which the current API is sufficient, and don't > see something that needs any drastic change in this API. Note that > tablesync worker that executes SQL also uses these APIs, so we may > need something in the future for either of those. Then finally we need > a slotsync worker to also connect to a database to use SQL and fetch > results. > > Now, let us consider if we extend the replication commands like > READ_REPLICATION_SLOT and or introduce a new set of replication > commands to fetch the required information then we don't need a DB > connection with primary or a connection in slotsync worker. As per my > current understanding, it is quite doable but I think we will slowly > go in the direction of making replication commands something like SQL > because today we need to extend it to fetch all slots info that have > failover marked as true, the existence of a particular replication, > etc. Then tomorrow, if we want to extend this work to have multiple > slotsync workers say workers perdb then we have to extend the > replication command to fetch per-database failover marked slots. To > me, it sounds more like we are slowly adding SQL-like features to > replication commands. > > Apart from this when we are reading per-db replication slots without > connecting to a database, we probably need some additional protection > mechanism so that the database won't get dropped. > > Considering all this it seems that for now probably extending > replication commands can simplify a few things like mentioned above > but using SQL's with db-connection is more extendable. > > Thoughts? >
Bertrand, and others, do you have an opinion on this matter? -- With Regards, Amit Kapila.