On Tue, Apr 19, 2016 at 4:37 PM, Rob Brucks <rob.bru...@rackspace.com> wrote:
> I'd like to propose two enhancements to the PostgreSQL code, but I'm not > sure if this is the correct mailing list. So if it's not then please let > me know where I need to post this. > > These are monitoring-centric enhancement requests since I'm trying to > implement accurate monitoring in a secure fashion. > > * General monitoring: > We have a need for a "monitoring" role in PostgreSQL that has read-only > access to any "pg_stat" view. As of 9.4, only a super-user can read all > columns of "pg_stat_activity", "pg_stat_replication", and > "pg_stat_archiver" (there may be other restricted views as well). These > views provide critical insight on how well the cluster is operating and > what is going on. > > There appears to be only two ways to gain access to these views: > > 1. grant super-user to my monitoring user > 2. write custom functions owned by a super-user with "SECURITY > DEFINER" and grant access to my monitoring user > > Option 1 is too big of a security risk to grant to a monitoring user. If > the monitoring system gets compromised then the DB will be at risk too. > > Option 2 requires creating, deploying, and maintaining sets of monitoring > functions, which is a decent chunk of work in a large environment with > dozens or hundreds of deployments, many running different versions of > postgres possibly needing custom versions of the functions. When you add > the bureaucracy of a large IT organization and SOX and PCI compliance > requirements it ends up being a PITA implementing or changing these > functions when you only have a small DBA team. > > > * Streaming Replication Monitoring: > Make the "pg_stat_replication" view more persistent (maybe keep the rows > for 24 hours or have a registration process?). > > There appears to be no way to *accurately* monitor streaming replication > via SQL alone currently. This is due to three different problems: > > 1. When a standby gets disconnected from the master then the > corresponding row is immediately removed from pg_stat_replication on the > master. Once the row is gone you cannot tell via simple SQL whether a > standby is behind or not (or that it even existed at all) without storing > prior values and extrapolating from them. > 2. On the standby, if the WAL streaming process gets disconnected from > the master then it does not report that it is behind > because pg_last_xlog_receive_location() has not been updated from the > master. The standby has no idea how far ahead the master has gotten and > just blindly reports the last value. > 3. On a "quiet" system there may not be any update activity on the > master for a long time, which makes the pg_last_xact_replay_timestamp() > function report an ever-increasing interval. So it is not useable for > accurately measuring lag on quiet systems. > > Ideally the master should be able to report standby lag time via SQL, even > when there has been a disruption in connectivity with a standby. > > The only accurate method I have found to measure standby lag is to create > a synthetic update that runs periodically. This works, but is less than > ideal and requires adding a table to every cluster (which then has to be > vacuumed frequently too) and writing and maintaining a process to update it > and purge it. > > > These two enhancements would go a long way in making it easier to monitor > PostgreSQL clusters and replication because it would eliminate a lot of > custom coding requirements and enable us to pull metrics directly via > simple SQL. > > If anyone can provide insight on how I could accomplish these in a simple > manner by other means then I'm all ears! > > Thanks, > Rob > Rob, Enhancement or feature requests should probably go to *Custormer Feedback <https://postgresql.uservoice.com/forums/21853-general>* https://postgresql.uservoice.com/forums/21853-general -- *Melvin Davidson* I reserve the right to fantasize. Whether or not you wish to share my fantasy is entirely up to you.