Re: suboverflowed subtransactions concurrency performance optimize

Zhihong Yu Tue, 31 Aug 2021 10:21:22 -0700

On Mon, Aug 30, 2021 at 1:43 AM Pengchengliu <[email protected]>
wrote:


> Hi hackers,
> I wrote a patch to resolve the subtransactions concurrency performance
> problems when suboverflowed.
>
> When we use more than PGPROC_MAX_CACHED_SUBXIDS(64) subtransactions per
> transaction concurrency, it will lead to subtransactions performance
> problems.
> All backend will be stuck at acquiring lock SubtransSLRULock.
>
> The reproduce steps in PG master branch:
>
> 1, init a cluster, append postgresql.conf as below:
>
> max_connections = '2500'
> max_files_per_process = '2000'
> max_locks_per_transaction = '64'
> max_parallel_maintenance_workers = '8'
> max_parallel_workers = '60'
> max_parallel_workers_per_gather = '6'
> max_prepared_transactions = '15000'
> max_replication_slots = '10'
> max_wal_senders = '64'
> max_worker_processes = '250'
> shared_buffers = 8GB
>
> 2, create table and insert some records as below:
>
> CREATE UNLOGGED TABLE contend (
>     id integer,
>     val integer NOT NULL
> )
> WITH (fillfactor='50');
>
> INSERT INTO contend (id, val)
> SELECT i, 0
> FROM generate_series(1, 10000) AS i;
>
> VACUUM (ANALYZE) contend;
>
> 3, The script subtrans_128.sql in attachment. use pgbench with
> subtrans_128.sql as below.
> pgbench  -d postgres -p 33800 -n -r -f subtrans_128.sql  -c 500 -j 500 -T
> 3600
>
>
> 4, After for a while, we can get the stuck result. We can query
> pg_stat_activity. All backends wait event is SubtransSLRULock.
>    We can use pert top and try find the root cause. The result of perf top
> as below:
> 66.20%  postgres            [.] pg_atomic_compare_exchange_u32_impl
>   29.30%  postgres            [.] pg_atomic_fetch_sub_u32_impl
>    1.46%  postgres            [.] pg_atomic_read_u32
>    1.34%  postgres            [.] TransactionIdIsCurrentTransactionId
>    0.75%  postgres            [.] SimpleLruReadPage_ReadOnly
>    0.14%  postgres            [.] LWLockAttemptLock
>    0.14%  postgres            [.] LWLockAcquire
>    0.12%  postgres            [.] pg_atomic_compare_exchange_u32
>    0.09%  postgres            [.] HeapTupleSatisfiesMVCC
>    0.06%  postgres            [.] heapgetpage
>    0.03%  postgres            [.] sentinel_ok
>    0.03%  postgres            [.] XidInMVCCSnapshot
>    0.03%  postgres            [.] slot_deform_heap_tuple
>    0.03%  postgres            [.] ExecInterpExpr
>    0.02%  postgres            [.] AllocSetCheck
>    0.02%  postgres            [.] HeapTupleSatisfiesVisibility
>    0.02%  postgres            [.] LWLockRelease
>    0.02%  postgres            [.] TransactionIdPrecedes
>    0.02%  postgres            [.] SubTransGetParent
>    0.01%  postgres            [.] heapgettup_pagemode
>    0.01%  postgres            [.] CheckForSerializableConflictOutNeeded
>
>
> After view the subtrans codes, it is easy to find that the global LWLock
> SubtransSLRULock is the bottleneck of subtrans concurrently.
>
> When a bakcend session assign more than PGPROC_MAX_CACHED_SUBXIDS(64)
> subtransactions, we will get a snapshot with suboverflowed.
> A suboverflowed snapshot does not contain all data required to determine
> visibility, so PostgreSQL will occasionally have to resort to pg_subtrans.
> These pages are cached in shared buffers, but you can see the overhead of
> looking them up in the high rank of SimpleLruReadPage_ReadOnly in the perf
> output.
>
> To resolve this performance problem, we think about a solution which cache
> SubtransSLRU to local cache.
> First we can query parent transaction id from SubtransSLRU, and copy the
> SLRU page to local cache page.
> After that if we need query parent transaction id again, we can query it
> from local cache directly.
> It will reduce the number of acquire and release LWLock SubtransSLRULock
> observably.
>
> From all snapshots, we can get the latest xmin. All transaction id which
> precedes this xmin, it muse has been commited/abortd.
> Their parent/top transaction has been written subtrans SLRU. Then we can
> cache the subtrans SLRU and copy it to local cache.
>
> Use the same produce steps above, with our patch we cannot get the stuck
> result.
> Note that append our GUC parameter in postgresql.conf. This optimize is off
> in default.
> local_cache_subtrans_pages=128
>
> The patch is base on PG master branch
> 0d906b2c0b1f0d625ff63d9ace906556b1c66a68
>
>
> Our project in  https://github.com/ADBSQL/AntDB, Welcome to follow us,
> AntDB, AsiaInfo's PG-based distributed database product
>
> Thanks
> Pengcheng
>
> Hi,

+   uint16  valid_offset;   /* how many entry is valid */

how many entry is -> how many entries are

+int slru_subtrans_page_num = 32;

Looks like the variable represents the number of subtrans pages. Maybe name
the variable slru_subtrans_page_count ?

+               if (lbuffer->in_htab == false)

The condition can be written as 'if (!lbuffer->in_htab)'

For SubtransAllocLocalBuffer(), you can enclose the body of method in while
loop so that you don't use goto statement.

Cheers

Re: suboverflowed subtransactions concurrency performance optimize

Reply via email to