Hi, I recently ran into a problem in one of our production postgresql cluster. I had noticed lock contention on procarray lock on standby, which causes WAL replay lag growth. To reproduce this, you can do the following:
1) set max_connections to big number, like 100000 2) begin a transaction on primary 3) start pgbench workload on primary and on standby After a while it will be possible to see KnownAssignedXidsGetAndSetXmin in perf top consuming abount 75 % of CPU. %% PerfTop: 1060 irqs/sec kernel: 0.0% exact: 0.0% [4000Hz cycles:u], (target_pid: 273361) ------------------------------------------------------------------------------- 73.92% postgres [.] KnownAssignedXidsGetAndSetXmin 1.40% postgres [.] base_yyparse 0.96% postgres [.] LWLockAttemptLock 0.84% postgres [.] hash_search_with_hash_value 0.84% postgres [.] AtEOXact_GUC 0.72% postgres [.] ResetAllOptions 0.70% postgres [.] AllocSetAlloc 0.60% postgres [.] _bt_compare 0.55% postgres [.] core_yylex 0.42% libc-2.27.so [.] __strlen_avx2 0.23% postgres [.] LWLockRelease 0.19% postgres [.] MemoryContextAllocZeroAligned 0.18% postgres [.] expression_tree_walker.part.3 0.18% libc-2.27.so [.] __memmove_avx_unaligned_erms 0.17% postgres [.] PostgresMain 0.17% postgres [.] palloc 0.17% libc-2.27.so [.] _int_malloc 0.17% postgres [.] set_config_option 0.17% postgres [.] ScanKeywordLookup 0.16% postgres [.] _bt_checkpage %% We have tried to fix this by using BitMapSet instead of boolean array KnownAssignedXidsValid, but this does not help too much. Instead, using a doubly linked list helps a little more, we got +1000 tps on pgbench workload with patched postgresql. The general idea of this patch is that, instead of memorizing which elements in KnownAssignedXids are valid, lets maintain a doubly linked list of them. This solution will work in exactly the same way, except that taking a snapshot on the replica is now O(running transaction) instead of O(head - tail) which is significantly faster under some workloads. The patch helps to reduce CPU usage of KnownAssignedXidsGetAndSetXmin to ~48% instead of ~74%, but does eliminate it from perf top. The problem is better reproduced on PG13 since PG14 has some snapshot optimization. Thanks! Best regards, reshke