Hi Both servers are configured with the same date, time and time configuration?
El mar., 23 de oct. de 2018 a la(s) 13:16, Hellmuth Vargas (hiv...@gmail.com) escribió: > Hi > > which result you get from the following query: > > SELECT CASE WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn() > THEN 0 > ELSE EXTRACT (EPOCH FROM now() - pg_last_xact_replay_timestamp()) > END AS log_delay; > > source: > > https://severalnines.com/blog/postgresql-streaming-replication-deep-dive > > El mar., 23 de oct. de 2018 a la(s) 11:28, Boris Sagadin ( > bo...@infosplet.com) escribió: > >> Nothing special, just: >> >> standby_mode = 'on' >> primary_conninfo = 'host=... user=repmgr application_name=nodex' >> recovery_target_timeline = 'latest' >> >> >> Boris >> >> On Tue, Oct 23, 2018 at 3:10 PM, Hellmuth Vargas <hiv...@gmail.com> >> wrote: >> >>> Hi >>> >>> can share recovery.conf file settings?? >>> >>> El mar., 23 de oct. de 2018 a la(s) 00:28, Boris Sagadin ( >>> bo...@infosplet.com) escribió: >>> >>>> Yes, turning wal_compression off improves things. Slave that was >>>> mentioned unfortunately lagged too much before this setting was applied and >>>> was turned off. However the remaining slave lags less now, although still >>>> occasionally up to a few minutes. I think single threadedness of recovery >>>> is a big slowdown for write heavy databases. Maybe an option to increase >>>> wal_size beyond 16MB in v11 will help. >>>> >>>> In the meantime we'll solve this by splitting the DB to 2 or 3 clusters >>>> or maybe trying out some sharding solution like Citus. >>>> >>>> >>>> Boris >>>> >>>> On Sun, Oct 21, 2018 at 9:06 AM, Boris Sagadin <bo...@infosplet.com> >>>> wrote: >>>> >>>>> Hello, >>>>> >>>>> I have a database running on i3.8xlarge (256GB RAM, 32 CPU cores, 4x >>>>> 1.9TB NVMe drive) AWS instance with about 5TB of disk space occupied, >>>>> ext4, >>>>> Ubuntu 16.04. >>>>> >>>>> Multi-tenant DB with about 40000 tables, insert heavy. >>>>> >>>>> I started a new slave with identical HW specs, SR. DB started syncing >>>>> from master, which took about 4 hours, then it started applying the WALs. >>>>> However, it seems it can't catch up. Delay is still around 3 hours >>>>> (measured with now() - pg_last_xact_replay_timestamp()), even a day later. >>>>> It goes a few 100s up and down, but it seems to float around 3h mark. >>>>> >>>>> Disk IO is low at about 10%, measured with iostat, no connected >>>>> clients, recovery process is at around 90% CPU single core usage. >>>>> >>>>> Tried tuning the various parameters, but with no avail. Only thing I >>>>> found suspicious is stracing the recovery process constantly produces many >>>>> errors such as: >>>>> >>>>> lseek(428, 0, SEEK_END) = 780124160 >>>>> lseek(30, 0, SEEK_END) = 212992 >>>>> read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource >>>>> temporarily unavailable) >>>>> lseek(680, 0, SEEK_END) = 493117440 >>>>> read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource >>>>> temporarily unavailable) >>>>> lseek(774, 0, SEEK_END) = 583368704 >>>>> >>>>> ...[snip]... >>>>> >>>>> read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource >>>>> temporarily unavailable) >>>>> lseek(774, 0, SEEK_END) = 583368704 >>>>> read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource >>>>> temporarily unavailable) >>>>> lseek(277, 0, SEEK_END) = 502882304 >>>>> lseek(6, 516096, SEEK_SET) = 516096 >>>>> read(6, >>>>> "\227\320\5\0\1\0\0\0\0\340\7\246\26\274\0\0\315\0\0\0\0\0\0\0}\0178\5&/\260\r"..., >>>>> 8192) = 8192 >>>>> read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource >>>>> temporarily unavailable) >>>>> lseek(735, 0, SEEK_END) = 272809984 >>>>> read(9, 0x7ffe4001f557, 1) = -1 EAGAIN (Resource >>>>> temporarily unavailable) >>>>> lseek(277, 0, SEEK_END) = 502882304 >>>>> >>>>> ls -l fd/9 >>>>> lr-x------ 1 postgres postgres 64 Oct 21 06:21 fd/9 -> pipe:[46358] >>>>> >>>>> >>>>> Perf top on recovery produces: >>>>> >>>>> 27.76% postgres [.] pglz_decompress >>>>> 9.90% [kernel] [k] entry_SYSCALL_64_after_swapgs >>>>> 7.09% postgres [.] hash_search_with_hash_value >>>>> 4.26% libpthread-2.23.so [.] llseek >>>>> 3.64% libpthread-2.23.so [.] __read_nocancel >>>>> 2.80% [kernel] [k] __fget_light >>>>> 2.67% postgres [.] 0x000000000034d3ba >>>>> 1.85% [kernel] [k] ext4_llseek >>>>> 1.84% postgres [.] pg_comp_crc32c_sse42 >>>>> 1.44% postgres [.] hash_any >>>>> 1.35% postgres [.] 0x000000000036afad >>>>> 1.29% postgres [.] MarkBufferDirty >>>>> 1.21% postgres [.] XLogReadRecord >>>>> [...] >>>>> >>>>> Tried changing the process limits with prlimit to unlimited, but no >>>>> change. >>>>> >>>>> I can turn off the WAL compression but I doubt this is the main >>>>> culprit. Any ideas appreciated. >>>>> >>>>> Regards, >>>>> Boris >>>>> >>>>> >>>> >>> >>> -- >>> Cordialmente, >>> >>> Ing. Hellmuth I. Vargas S. >>> Esp. Telemática y Negocios por Internet >>> Oracle Database 10g Administrator Certified Associate >>> EnterpriseDB Certified PostgreSQL 9.3 Associate >>> >>> >> > > -- > Cordialmente, > > Ing. Hellmuth I. Vargas S. > > > -- Cordialmente, Ing. Hellmuth I. Vargas S.