On Fri, Jun 14, 2019 at 08:02 Tiemen Ruiten <t.rui...@tech-lab.io> wrote:
> Hello, > > I setup a new 3-node cluster with the following specifications: > > 2x Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz (2*20 cores) > 128 GB RAM > 8x Crucial MX500 1TB SSD's > > FS is ZFS, the dataset with the PGDATA directory on it has the following > properties (only non-default listed): > > NAME PROPERTY VALUE SOURCE > tank/pgsql used 234G - > tank/pgsql available 3.28T - > tank/pgsql referenced 234G - > tank/pgsql compressratio 2.68x - > tank/pgsql compression lz4 inherited from > tank > You're possibly slowing yourself down a lot by using compression here. Turning off compression though is multiple steps. You basically have to turn it off in the FS and then rewrite/copy the data. I'd check in a scratch area how long it takes to write ~3G of random data with compression on vs off. > > tank/pgsql atime off inherited from > tank > tank/pgsql canmount on local > tank/pgsql xattr sa inherited from > tank > > My postgresql.conf (only changed from default listed): > > hba_file = '/var/lib/pgsql/pg_hba.conf' > listen_addresses = '*' > max_connections = 800 > shared_buffers = 8GB > work_mem = 64MB > maintenance_work_mem = 2GB > autovacuum_work_mem = 1GB > dynamic_shared_memory_type = posix > effective_io_concurrency = 200 > max_worker_processes = 50 > max_parallel_maintenance_workers = 8 > max_parallel_workers_per_gather = 8 > max_parallel_workers = 40 > wal_level = replica > synchronous_commit = off > full_page_writes = on > wal_log_hints = on > wal_buffers = 128MB > checkpoint_timeout = 60min > max_wal_size = 8GB > min_wal_size = 1GB > checkpoint_completion_target = 0.9 > archive_mode = on > archive_command = 'pgbackrest --stanza=pgdb2 archive-push %p' > max_wal_senders = 10 > wal_keep_segments = 20 > hot_standby = on > hot_standby_feedback = on > random_page_cost = 1.5 > effective_cache_size = 48GB > default_statistics_target = 500 # range 1-10000 > idle_in_transaction_session_timeout = 30min # in milliseconds, 0 is > disabled > shared_preload_libraries = 'timescaledb, pg_cron' > max_locks_per_transaction = 512 > timescaledb.max_background_workers = 8 > > My problem is that checkpoints are taking a long time. Even when I run a > few manual checkpoints one after the other, they keep taking very long, up > to 10 minutes: > > 2019-06-14 15:21:10.351 CEST [23657] LOG: checkpoint starting: immediate > force wait > 2019-06-14 15:25:57.655 CEST [23657] LOG: checkpoint complete: wrote > 139831 buffers (13.3%); 0 WAL file(s) added, 148 removed, 40 recycled; > write=284.751 s, sync=0.532 s, total=287.304 s; sync files=537, > longest=0.010 s, average=0.000 s; distance=2745065 kB, estimate=2841407 kB > 2019-06-14 15:26:01.988 CEST [23657] LOG: checkpoint starting: immediate > force wait > 2019-06-14 15:30:30.430 CEST [23657] LOG: checkpoint complete: wrote > 238322 buffers (22.7%); 0 WAL file(s) added, 0 removed, 172 recycled; > write=264.794 s, sync=0.415 s, total=268.441 s; sync files=378, > longest=0.011 s, average=0.001 s; distance=2834018 kB, estimate=2840668 kB > 2019-06-14 15:30:44.097 CEST [23657] LOG: checkpoint starting: immediate > force wait > 2019-06-14 15:37:01.438 CEST [23657] LOG: checkpoint complete: wrote > 132286 buffers (12.6%); 0 WAL file(s) added, 54 removed, 96 recycled; > write=366.614 s, sync=2.975 s, total=377.341 s; sync files=467, > longest=0.095 s, average=0.006 s; distance=2444291 kB, estimate=2801030 kB > > What is going on? It doesn't seem like normal behaviour? > -- "Genius might be described as a supreme capacity for getting its possessors into trouble of all kinds." -- Samuel Butler