pá 22. 3. 2019 v 10:49 odesílatel Aleš Zelený <zeleny.a...@gmail.com>
napsal:

> Hello,
>
> I've learned that logical replication might have performance problem if
> there are lot of sub transactions within transaction (at least because it
> enforces spill files in pg_replslot and if there are many - like 80mio,
> EXT4 did not perform  well - in my case it was caused bu misuse of
> exception handling in trigger function) and once we get rid of this usage
> pattern logical replication works like a charm.
> Until we have applied schema migration patch creating 700+ tables by
> flyway, therefore whole file was applied in single transaction (like psql
> -1 -f xxxx) and it effectively blocked our replication for 22 hours. No
> failures, wal sender processed did not consumed huge amount of memory (what
> happened in past wiht 80mi sub-trans caused by trigger), all wal senders
> did IO very rarely, most of the time they dwell on CPU.
>
> The spill files were not so big:
> # du -sh xid-924361123-lsn-FBC-C7000000.snap
> 224K    xid-924361123-lsn-FBC-C7000000.snap
>
> It takes about 1 minute (all the time 100% CPU used by was sender) to move
> to following file.
>
> Version, OS:
> PostgreSQL 10.6 (Ubuntu 10.6-1.pgdg16.04+1) on x86_64-pc-linux-gnu,
> compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609, 64-bit
>
> Perf:
> Overhead  Command   Shared Object      Symbol
>   57.79%  postgres  postgres           [.] hash_seq_search
>   14.70%  postgres  postgres           [.] hash_search_with_hash_value
>    5.93%  postgres  postgres           [.] LocalExecuteInvalidationMessage
>    3.60%  postgres  postgres           [.] RelfilenodeMapInvalidateCallback
>    2.99%  postgres  postgres           [.] hash_uint32
>    2.50%  postgres  postgres           [.] hash_search
>    1.91%  postgres  postgres           [.] CatCacheInvalidate
>    1.83%  postgres  postgres           [.] CallSyscacheCallbacks
>    1.37%  postgres  postgres           [.] hash_seq_init
>    1.24%  postgres  postgres           [.] hash_seq_term
>    1.02%  postgres  postgres           [.] RelationCacheInvalidateEntry
>    0.97%  postgres  postgres           [.] ReorderBufferCommit
>    0.97%  postgres  pgoutput.so        [.] rel_sync_cache_relation_cb
>    0.78%  postgres  postgres           [.] uint32_hash
>    0.73%  postgres  postgres           [.] PlanCacheRelCallback
>    0.60%  postgres  postgres           [.] InvalidateCatalogSnapshot
>    0.58%  postgres  postgres           [.] SysCacheInvalidate
>    0.35%  postgres  postgres           [.] GetCurrentTransactionNestLevel
>    0.13%  postgres  pgoutput.so        [.] _init
>    0.00%  postgres  [kernel.kallsyms]  [k] __do_softirq
>    0.00%  postgres  [kernel.kallsyms]  [k] _raw_spin_unlock_irqrestore
>    0.00%  postgres  postgres           [.] ResourceOwnerEnlargeRelationRefs
>
> My suspect is that psql (of flyway) while processing a file with many
> create table and grant statements in single transaction mode might create
> sub transactions, but I haven't found it in docs.
>

Maybe you have disabled autocommit with ON_ERROR_ROLLBACK on

I don't know about another case, where psql can use subtransactions.

Pavel


>
> Does you have similar experience? What is root cause for this issue?
>
> Workaround is clear - do not create that large amount of DDL in single
> transaction, but it would be nice to know root cause to eliminate
> scripts/usage patterns causing this issue generally.
>
> On the other hand, following transaction (100GB of spill files) , almost
> 150 mio rows deleted (and archived by statement trigger) behaves expected
> way - wal sender process was reading from disk (something like 130MB/s) and
> after ~20 minutes replication was in sync, all spill files were removed, so
> transaction changes volume itself does not seems to be any issue.
>
> Thanks for any advice!
>
> Kind regards Ales Zeleny
> P.S.: PostgreSQL log facility is superb, having DDL statements logged and
> XID in log line prefix I was able to identify the issue source in this case.
>

Reply via email to