pá 22. 3. 2019 v 10:49 odesílatel Aleš Zelený <zeleny.a...@gmail.com> napsal:
> Hello, > > I've learned that logical replication might have performance problem if > there are lot of sub transactions within transaction (at least because it > enforces spill files in pg_replslot and if there are many - like 80mio, > EXT4 did not perform well - in my case it was caused bu misuse of > exception handling in trigger function) and once we get rid of this usage > pattern logical replication works like a charm. > Until we have applied schema migration patch creating 700+ tables by > flyway, therefore whole file was applied in single transaction (like psql > -1 -f xxxx) and it effectively blocked our replication for 22 hours. No > failures, wal sender processed did not consumed huge amount of memory (what > happened in past wiht 80mi sub-trans caused by trigger), all wal senders > did IO very rarely, most of the time they dwell on CPU. > > The spill files were not so big: > # du -sh xid-924361123-lsn-FBC-C7000000.snap > 224K xid-924361123-lsn-FBC-C7000000.snap > > It takes about 1 minute (all the time 100% CPU used by was sender) to move > to following file. > > Version, OS: > PostgreSQL 10.6 (Ubuntu 10.6-1.pgdg16.04+1) on x86_64-pc-linux-gnu, > compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609, 64-bit > > Perf: > Overhead Command Shared Object Symbol > 57.79% postgres postgres [.] hash_seq_search > 14.70% postgres postgres [.] hash_search_with_hash_value > 5.93% postgres postgres [.] LocalExecuteInvalidationMessage > 3.60% postgres postgres [.] RelfilenodeMapInvalidateCallback > 2.99% postgres postgres [.] hash_uint32 > 2.50% postgres postgres [.] hash_search > 1.91% postgres postgres [.] CatCacheInvalidate > 1.83% postgres postgres [.] CallSyscacheCallbacks > 1.37% postgres postgres [.] hash_seq_init > 1.24% postgres postgres [.] hash_seq_term > 1.02% postgres postgres [.] RelationCacheInvalidateEntry > 0.97% postgres postgres [.] ReorderBufferCommit > 0.97% postgres pgoutput.so [.] rel_sync_cache_relation_cb > 0.78% postgres postgres [.] uint32_hash > 0.73% postgres postgres [.] PlanCacheRelCallback > 0.60% postgres postgres [.] InvalidateCatalogSnapshot > 0.58% postgres postgres [.] SysCacheInvalidate > 0.35% postgres postgres [.] GetCurrentTransactionNestLevel > 0.13% postgres pgoutput.so [.] _init > 0.00% postgres [kernel.kallsyms] [k] __do_softirq > 0.00% postgres [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore > 0.00% postgres postgres [.] ResourceOwnerEnlargeRelationRefs > > My suspect is that psql (of flyway) while processing a file with many > create table and grant statements in single transaction mode might create > sub transactions, but I haven't found it in docs. > Maybe you have disabled autocommit with ON_ERROR_ROLLBACK on I don't know about another case, where psql can use subtransactions. Pavel > > Does you have similar experience? What is root cause for this issue? > > Workaround is clear - do not create that large amount of DDL in single > transaction, but it would be nice to know root cause to eliminate > scripts/usage patterns causing this issue generally. > > On the other hand, following transaction (100GB of spill files) , almost > 150 mio rows deleted (and archived by statement trigger) behaves expected > way - wal sender process was reading from disk (something like 130MB/s) and > after ~20 minutes replication was in sync, all spill files were removed, so > transaction changes volume itself does not seems to be any issue. > > Thanks for any advice! > > Kind regards Ales Zeleny > P.S.: PostgreSQL log facility is superb, having DDL statements logged and > XID in log line prefix I was able to identify the issue source in this case. >