Hi, We are asked to have key monitoring or alerting added to our postgres database. And I am thinking of metrics like blocked transactions, Max used transaction Ids, Max Active session threshold, Deadlock, Long running query, replica lag, buffer cache hit ratio, read/write IOPS or latency etc. I have below questions
1)Below are some which i tried writing, can you please let me know if these are accurate? 2)How should we be writing the alerting query for deadlock, max used transaction ids, read/write IOPS and latency? 3)Are there any docs available which have these sample sql queries on the pg_* table for these critical alerts which we can easily configure through any tool? 4)Any other alerts which we should be really having? *****Blocking sessions select distinct blocking_id from ( SELECT activity.pid, activity.usename, activity.query, blocking.pid AS blocking_id, blocking.query AS blocking_query FROM pg_stat_activity AS activity JOIN pg_stat_activity AS blocking ON blocking.pid = ANY(pg_blocking_pids(activity.pid)) ) a; **** long running beyond ~1 hours***** SELECT query, datname, pid, now() - state_change AS idle_for FROM pg_stat_activity WHERE state IN ('active', 'idle in transaction') AND pid <> pg_backend_pid() AND xact_start < now() - interval '1 hour' ORDER BY age(backend_xmin) DESC NULLS LAST; **** No of active sessions ****** SELECT count(*) AS active_connections FROM pg_stat_activity WHERE state = 'active'; ***replica lag**** SELECT client_addr, state, sent_location, write_location, flush_location, replay_location, pg_wal_lsn_diff(sent_location, replay_location) AS replica_lag FROM pg_stat_replication; ***buffer cache hit ratio**** SELECT (1 - (blks_read::float / (blks_hit + blks_read))) * 100 AS buffer_cache_hit_ratio FROM pg_stat_database; Regards Yudhi