On Sat, Jan 20, 2024 at 10:52 AM Dilip Kumar <dilipbal...@gmail.com> wrote: > > On Fri, Jan 19, 2024 at 5:24 PM Amit Kapila <amit.kapil...@gmail.com> wrote: > > > > On Wed, Jan 17, 2024 at 4:00 PM shveta malik <shveta.ma...@gmail.com> wrote: > > > > > > > I had some off-list discussions with Sawada-San, Hou-San, and Shveta > > on the topic of extending replication commands instead of using the > > current model where we fetch the required slot information via SQL > > using a database connection. I would like to summarize the discussion > > and would like to know the thoughts of others on this topic. > > > > In the current patch, we launch the slotsync worker on physical > > standby which connects to the specified database (currently we let > > users specify the required dbname in primary_conninfo) on the primary. > > It then fetches the required information for failover marked slots > > from the primary and also does some primitive checks on the upstream > > node via SQL (the additional checks are like whether the upstream node > > has a specified physical slot or whether the upstream node is a > > primary node or a standby node). To fetch the required information it > > uses a libpqwalreciever API which is mostly apt for this purpose as it > > supports SQL execution but for this patch, we don't need a replication > > connection, so we extend the libpqwalreciever connect API. > > What sort of extension we have done to 'libpqwalreciever'? Is it > something like by default this supports replication connections so we > have done an extension to the API so that we can provide an option > whether to create a replication connection or a normal connection? >
Yeah and in the future there could be more as well. The other function added walrcv_get_dbname_from_conninfo doesn't appear to be a problem either for now. > > Now, the concerns related to this could be that users would probably > > need to change existing mechanisms/tools to update priamry_conninfo > > and one of the alternatives proposed is to have an additional GUC like > > slot_sync_dbname. Users won't be able to drop the database this worker > > is connected to aka whatever is specified in slot_sync_dbname but as > > the user herself sets up the configuration it shouldn't be a big deal. > > Yeah for this purpose users may use template1 or so which they > generally don't plan to drop. > Using template1 has other problems like users won't be able to create a new database. See [2] (point number 2.2) > > So in case the user wants to drop that > database user needs to turn off the slot syncing option and then it > can be done? > Right. > > Then we also discussed whether extending libpqwalreceiver's connect > > API is a good idea and whether we need to further extend it in the > > future. As far as I can see, slotsync worker's primary requirement is > > to execute SQL queries which the current API is sufficient, and don't > > see something that needs any drastic change in this API. Note that > > tablesync worker that executes SQL also uses these APIs, so we may > > need something in the future for either of those. Then finally we need > > a slotsync worker to also connect to a database to use SQL and fetch > > results. > > While looking into the patch v64-0002 I could not exactly point out > what sort of extensions are there in libpqwalreceiver.c, I just saw > one extra API for fetching the dbname from connection info? > Right, the worry was that we may need it in the future. > > Now, let us consider if we extend the replication commands like > > READ_REPLICATION_SLOT and or introduce a new set of replication > > commands to fetch the required information then we don't need a DB > > connection with primary or a connection in slotsync worker. As per my > > current understanding, it is quite doable but I think we will slowly > > go in the direction of making replication commands something like SQL > > because today we need to extend it to fetch all slots info that have > > failover marked as true, the existence of a particular replication, > > etc. Then tomorrow, if we want to extend this work to have multiple > > slotsync workers say workers perdb then we have to extend the > > replication command to fetch per-database failover marked slots. To > > me, it sounds more like we are slowly adding SQL-like features to > > replication commands. > > > > Apart from this when we are reading per-db replication slots without > > connecting to a database, we probably need some additional protection > > mechanism so that the database won't get dropped. > > Something like locking the database only while fetching the slots? > Possible, but can we lock the database from an auxiliary process? > > Considering all this it seems that for now probably extending > > replication commands can simplify a few things like mentioned above > > but using SQL's with db-connection is more extendable. > > Even I have similar thoughts. > Thanks. [1] - https://www.postgresql.org/message-id/CAJpy0uBhPx1MDHh903XpFAhpBH23KzVXyg_4VjH2zXk81oGi1w%40mail.gmail.com -- With Regards, Amit Kapila.