Hello, In my previous post[1] on this list I brought up an issue with long running checkpoints. I reduced checkpoint_timeout to a more reasonable value (15m down from 60m) and forced/immediate checkpoints now complete mostly in under a minute. This thread and another one[2] on the Clusterlabs mailinglist also helped me understand more about how PostgreSQL internals work, thanks everyone!
Now to my current issue: I took the advice to add more monitoring on replay lag (using pg_last_xact_replay_timestamp) and things are not looking good. Last night replication lagged by almost 6 hours on one of the nodes[3], but eventually caught up. As you can see in that screenshot, ph-sql-03 is consistently slower to replay than ph-sql-05 (ph-sql-04 is the current master) and there happen to be different SSD's in ph-sql-03 (Crucial MX300 vs Crucial MX500 in the other two), which makes me think this is IO related. When I check the replay_lag column of pg_stat_replication, the numbers are consistent with the data from pg_last_xact_replay_timestamp: postgres=# SELECT application_name, replay_lag FROM pg_stat_replication; application_name | replay_lag ------------------+----------------- ph-sql-03 | 00:15:16.179952 ph-sql-05 | 00:10:01.078163 Currently this doesn't present an operational issue, as the slaves aren't used by applications (still waiting for development to make the necessary changes). So there are no queries running at all on the slaves apart from the occasional monitoring. Cluster specifications: all nodes: - filesystem: ZFS stripe of mirrors - 2* CPU: Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz (20 cores total) - 128 GB RAM ph-sql-03: 8* Crucial MX300 1050MB, underprovisioned to 1TB ph-sql-0{4,5}: 8* Crucial MX500 1TB postgresql.conf with GUCs changed from default: shared_buffers = 8GB work_mem = 64MB maintenance_work_mem = 2GB autovacuum_work_mem = 1GB effective_io_concurrency = 200 max_worker_processes = 50 max_parallel_maintenance_workers = 8 max_parallel_workers_per_gather = 8 max_parallel_workers = 40 wal_level = replica synchronous_commit = off full_page_writes = off wal_log_hints = on wal_buffers = 128MB checkpoint_timeout = 15min max_wal_size = 8GB min_wal_size = 1GB checkpoint_completion_target = 0.9 archive_mode = on archive_command = 'pgbackrest --stanza=pgdb2 archive-push %p' max_wal_senders = 10 wal_keep_segments = 20 hot_standby = on hot_standby_feedback = on random_page_cost = 1.5 effective_cache_size = 48GB default_statistics_target = 500 shared_preload_libraries = 'timescaledb, pg_cron, pg_prewarm' # (change requires restart) max_locks_per_transaction = 512 What are possible reasons for the high replay_lag? Is my storage just too slow? Are there any tunables available? [1] https://www.postgresql.org/message-id/flat/CAEkBuzeno6ztiM1g4WdzKRJFgL8b2nfePNU%3Dq3sBiEZUm-D-sQ%40mail.gmail.com [2] https://lists.clusterlabs.org/pipermail/users/2019-July/025967.html [3] https://ibb.co/0sncjBZ Thank you, Tiemen Ruiten