On Tue, Nov 29, 2022 at 11:20 AM SATYANARAYANA NARLAPURAM < satyanarlapu...@gmail.com> wrote:
> > > On Tue, Nov 29, 2022 at 10:52 AM Andrey Borodin <amborodi...@gmail.com> > wrote: > >> On Tue, Nov 29, 2022 at 8:29 AM Bruce Momjian <br...@momjian.us> wrote: >> > >> > On Tue, Nov 29, 2022 at 08:14:10AM -0800, SATYANARAYANA NARLAPURAM >> wrote: >> > > 2. Process proc die immediately when a backend is waiting for sync >> > > replication acknowledgement, as it does today, however, upon >> restart, >> > > don't open up for business (don't accept ready-only connections) >> > > unless the sync standbys have caught up. >> > > >> > > >> > > Are you planning to block connections or queries to the database? It >> would be >> > > good to allow connections and let them query the monitoring views but >> block the >> > > queries until sync standby have caught up. Otherwise, this leaves a >> monitoring >> > > hole. In cloud, I presume superusers are allowed to connect and >> monitor (end >> > > customers are not the role members and can't query the data). The >> same can't be >> > > true for all the installations. Could you please add more details on >> your >> > > approach? >> > >> > I think ALTER SYSTEM should be allowed, particularly so you can modify >> > synchronous_standby_names, no? >> >> We don't allow SQL access during crash recovery until it's caught up >> to consistency point. And that's for a reason - the cluster may have >> invalid system catalog. >> So no, after crash without a quorum of standbys you can only change >> auto.conf and send SIGHUP. Accessing the system catalog during crash >> recovery is another unrelated problem. >> > > In the crash recovery case, catalog is inconsistent but in this case, the > cluster has remote uncommitted changes (consistent). Accepting a superuser > connection is no harm. The auth checks performed are still valid after > standbys fully caught up. I don't see a reason why superuser / pg_monitor > connections are required to be blocked. > If blocking queries is harder, and superuser is not allowed to connect as it can read remote uncommitted data, how about adding a new role that can update and reload the server configuration? > > >> But I'd propose to treat these two points differently, they possess >> drastically different scales of danger. Query Cancels are issued here >> and there during failovers\switchovers. Crash amidst network >> partitioning is not that common. >> > > Supportability and operability are more important in corner cases to > quickly troubleshoot an issue, > > >> >> Best regards, Andrey Borodin. >> >