RE: suboverflowed subtransactions concurrency performance optimize
Hi Andrey, > I think if we really want to fix exclusive SubtransSLRULock I think best > option would be to split SLRU control lock into array of locks I agree with you. If we can resolve the performance issue with this approach, It should be a good solution. > one for each bank (in > v17-0002-Divide-SLRU-buffers-into-n-associative-banks.patch) I have tested with this patch. And I have modified NUM_SUBTRANS_BUFFERS to 128. With 500 concurrence, it would not be stuck indeed. But the performance is very bad. For a sequence scan table, it uses more than one minute. I think it is unacceptable in a production environment. postgres=# select count(*) from contend ; count --- 10127 (1 row) Time: 86011.593 ms (01:26.012) postgres=# select count(*) from contend ; count --- 10254 (1 row) Time: 79399.949 ms (01:19.400) With my local subtrans optimize approach, the same env and the same test script and 500 concurrence, a sequence scan, it uses only less than 10 seconds. postgres=# select count(*) from contend ; count --- 10508 (1 row) Time: 7104.283 ms (00:07.104) postgres=# select count(*) from contend ; count --- 13175 (1 row) Time: 6602.635 ms (00:06.603) Thanks Pengcheng -Original Message- From: Andrey Borodin Sent: 2021年9月3日 14:51 To: Pengchengliu Cc: pgsql-hack...@postgresql.org Subject: Re: suboverflowed subtransactions concurrency performance optimize Sorry, for some reason Mail.app converted message to html and mailing list mangled this html into mess. I'm resending previous message as plain text again. Sorry for the noise. > 31 авг. 2021 г., в 11:43, Pengchengliu написал(а): > > Hi Andrey, > Thanks a lot for your replay and reference information. > > The default NUM_SUBTRANS_BUFFERS is 32. My implementation is > local_cache_subtrans_pages can be adjusted dynamically. > If we configure local_cache_subtrans_pages as 64, every backend use only > extra 64*8192=512KB memory. > So the local cache is similar to the first level cache. And subtrans SLRU is > the second level cache. > And I think extra memory is very well worth it. It really resolve massive > subtrans stuck issue which I mentioned in previous email. > > I have view the patch of [0] before. For SLRU buffers adding GUC > configuration parameters are very nice. > I think for subtrans, its optimize is not enough. For > SubTransGetTopmostTransaction, we should get the SubtransSLRULock first, then > call SubTransGetParent in loop. > Prevent acquire/release SubtransSLRULock in SubTransGetTopmostTransaction-> > SubTransGetParent in loop. > After I apply this patch which I optimize SubTransGetTopmostTransaction, > with my test case, I still get stuck result. SubTransGetParent() acquires only Shared lock on SubtransSLRULock. The problem may arise only when someone reads page from disk. But if you have big enough cache - this will never happen. And this cache will be much less than 512KB*max_connections. I think if we really want to fix exclusive SubtransSLRULock I think best option would be to split SLRU control lock into array of locks - one for each bank (in v17-0002-Divide-SLRU-buffers-into-n-associative-banks.patch). With this approach we will have to rename s/bank/partition/g for consistency with locks and buffers partitions. I really liked having my own banks, but consistency worth it anyway. Thanks! Best regards, Andrey Borodin.
RE: suboverflowed subtransactions concurrency performance optimize
Hi Andrey, Thanks a lot for your replay and reference information. The default NUM_SUBTRANS_BUFFERS is 32. My implementation is local_cache_subtrans_pages can be adjusted dynamically. If we configure local_cache_subtrans_pages as 64, every backend use only extra 64*8192=512KB memory. So the local cache is similar to the first level cache. And subtrans SLRU is the second level cache. And I think extra memory is very well worth it. It really resolve massive subtrans stuck issue which I mentioned in previous email. I have view the patch of [0] before. For SLRU buffers adding GUC configuration parameters are very nice. I think for subtrans, its optimize is not enough. For SubTransGetTopmostTransaction, we should get the SubtransSLRULock first, then call SubTransGetParent in loop. Prevent acquire/release SubtransSLRULock in SubTransGetTopmostTransaction-> SubTransGetParent in loop. After I apply this patch which I optimize SubTransGetTopmostTransaction, with my test case, I still get stuck result. [1] solution. Actually first, we try to use Buffer manager to replace SLRU for subtrans too. And we have implemented it. With the test case which I mentioned in previous mail, It was still stuck. In default there is 2048 subtrans in one page. When some processes get the top transaction in one page, they should pin/unpin and lock/unlock the same page repeatedly. I found than it was stuck at pin/unpin page for some backends. Compare test results, pgbench with subtrans_128.sql Concurrency PG masterPG master with path[0] Local cache optimize 300stuck stuck no stuck 500 stuck stuck no stuck 1000stuck stuck no stuck Maybe we can test different approach with my test case. For massive concurrency, if it will not be stuck, we can get a good solution. [0] https://commitfest.postgresql.org/34/2627/ [1] https://www.postgresql.org/message-id/flat/20180814213500.GA74618%4060f81dc409fc.ant.amazon.com Thanks Pengcheng -Original Message- From: Andrey Borodin Sent: 2021年8月30日 18:25 To: Pengchengliu Cc: pgsql-hack...@postgresql.org Subject: Re: suboverflowed subtransactions concurrency performance optimize Hi Pengcheng! You are solving important problem, thank you! > 30 авг. 2021 г., в 13:43, Pengchengliu написал(а): > > To resolve this performance problem, we think about a solution which > cache SubtransSLRU to local cache. > First we can query parent transaction id from SubtransSLRU, and copy > the SLRU page to local cache page. > After that if we need query parent transaction id again, we can query > it from local cache directly. A copy of SLRU in each backend's cache can consume a lot of memory. Why create a copy if we can optimise shared representation of SLRU? JFYI There is a related patch to make SimpleLruReadPage_ReadOnly() faster for bigger SLRU buffers[0]. Also Nik Samokhvalov recently published interesting investigation on the topic, but for some reason his message did not pass the moderation. [1] Also it's important to note that there was a community request to move SLRUs to shared_buffers [2]. Thanks! Best regards, Andrey Borodin. [0] https://commitfest.postgresql.org/34/2627/ [1] https://www.postgresql.org/message-id/flat/BE73A0BB-5929-40F4-BAF8-55323DE39561%40yandex-team.ru [2] https://www.postgresql.org/message-id/flat/20180814213500.GA74618%4060f81dc409fc.ant.amazon.com
suboverflowed subtransactions concurrency performance optimize
Hi hackers, I wrote a patch to resolve the subtransactions concurrency performance problems when suboverflowed. When we use more than PGPROC_MAX_CACHED_SUBXIDS(64) subtransactions per transaction concurrency, it will lead to subtransactions performance problems. All backend will be stuck at acquiring lock SubtransSLRULock. The reproduce steps in PG master branch: 1, init a cluster, append postgresql.conf as below: max_connections = '2500' max_files_per_process = '2000' max_locks_per_transaction = '64' max_parallel_maintenance_workers = '8' max_parallel_workers = '60' max_parallel_workers_per_gather = '6' max_prepared_transactions = '15000' max_replication_slots = '10' max_wal_senders = '64' max_worker_processes = '250' shared_buffers = 8GB 2, create table and insert some records as below: CREATE UNLOGGED TABLE contend ( id integer, val integer NOT NULL ) WITH (fillfactor='50'); INSERT INTO contend (id, val) SELECT i, 0 FROM generate_series(1, 1) AS i; VACUUM (ANALYZE) contend; 3, The script subtrans_128.sql in attachment. use pgbench with subtrans_128.sql as below. pgbench -d postgres -p 33800 -n -r -f subtrans_128.sql -c 500 -j 500 -T 3600 4, After for a while, we can get the stuck result. We can query pg_stat_activity. All backends wait event is SubtransSLRULock. We can use pert top and try find the root cause. The result of perf top as below: 66.20% postgres[.] pg_atomic_compare_exchange_u32_impl 29.30% postgres[.] pg_atomic_fetch_sub_u32_impl 1.46% postgres[.] pg_atomic_read_u32 1.34% postgres[.] TransactionIdIsCurrentTransactionId 0.75% postgres[.] SimpleLruReadPage_ReadOnly 0.14% postgres[.] LWLockAttemptLock 0.14% postgres[.] LWLockAcquire 0.12% postgres[.] pg_atomic_compare_exchange_u32 0.09% postgres[.] HeapTupleSatisfiesMVCC 0.06% postgres[.] heapgetpage 0.03% postgres[.] sentinel_ok 0.03% postgres[.] XidInMVCCSnapshot 0.03% postgres[.] slot_deform_heap_tuple 0.03% postgres[.] ExecInterpExpr 0.02% postgres[.] AllocSetCheck 0.02% postgres[.] HeapTupleSatisfiesVisibility 0.02% postgres[.] LWLockRelease 0.02% postgres[.] TransactionIdPrecedes 0.02% postgres[.] SubTransGetParent 0.01% postgres[.] heapgettup_pagemode 0.01% postgres[.] CheckForSerializableConflictOutNeeded After view the subtrans codes, it is easy to find that the global LWLock SubtransSLRULock is the bottleneck of subtrans concurrently. When a bakcend session assign more than PGPROC_MAX_CACHED_SUBXIDS(64) subtransactions, we will get a snapshot with suboverflowed. A suboverflowed snapshot does not contain all data required to determine visibility, so PostgreSQL will occasionally have to resort to pg_subtrans. These pages are cached in shared buffers, but you can see the overhead of looking them up in the high rank of SimpleLruReadPage_ReadOnly in the perf output. To resolve this performance problem, we think about a solution which cache SubtransSLRU to local cache. First we can query parent transaction id from SubtransSLRU, and copy the SLRU page to local cache page. After that if we need query parent transaction id again, we can query it from local cache directly. It will reduce the number of acquire and release LWLock SubtransSLRULock observably. >From all snapshots, we can get the latest xmin. All transaction id which precedes this xmin, it muse has been commited/abortd. Their parent/top transaction has been written subtrans SLRU. Then we can cache the subtrans SLRU and copy it to local cache. Use the same produce steps above, with our patch we cannot get the stuck result. Note that append our GUC parameter in postgresql.conf. This optimize is off in default. local_cache_subtrans_pages=128 The patch is base on PG master branch 0d906b2c0b1f0d625ff63d9ace906556b1c66a68 Our project in https://github.com/ADBSQL/AntDB, Welcome to follow us, AntDB, AsiaInfo's PG-based distributed database product Thanks Pengcheng subtrans_128.sql Description: Binary data diff --git a/src/backend/access/rmgrdesc/standbydesc.c b/src/backend/access/rmgrdesc/standbydesc.c index 01ee7ac6d2..9b047c863a 100644 --- a/src/backend/access/rmgrdesc/standbydesc.c +++ b/src/backend/access/rmgrdesc/standbydesc.c @@ -129,6 +129,8 @@ standby_desc_invalidations(StringInfo buf, appendStringInfo(buf, " relmap db %u", msg->rm.dbId); else if (msg->id == SHAREDINVALSNAPSHOT_ID) appendStringInfo(buf, " snapshot %u", msg->sn.relId); + else if (msg->id == SUBTRANS_INVALID_PAGE_ID) + appendStringInfo(buf, " subtrans page %u", msg->spp.pageno); else
RE: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump
Hi Greg, Thanks a lot for you explanation and your fix. I think your fix can resolve the core dump issue. As with your fix, parallel process reset Transaction Xmin from ActiveSnapshot. But it will change Transaction snapshot for all parallel scenarios. I don't know whether it bring in other issue. For test only, I think it is enough. So is there anybody can explain what's exactly difference between ActiveSnapshot and TransactionSnapshot in parallel work process. Then maybe we can find a better solution and try to fix it really. Thanks Pengcheng -Original Message- From: Greg Nancarrow Sent: 2021年5月18日 17:15 To: Pengchengliu Cc: Andres Freund ; PostgreSQL-development Subject: Re: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump On Tue, May 18, 2021 at 11:27 AM Pengchengliu wrote: > > Hi Greg, > >Actually I am very confused about ActiveSnapshot and TransactionSnapshot. > I don't know why main process send ActiveSnapshot and TransactionSnapshot > separately. And what is exact difference between them? >If you know that, could you explain that for me? It will be very > appreciated. In the context of a parallel-worker, I am a little confused too, so I can't explain it either. It is not really explained in the file "src\backend\access\transam\README.parallel", it only mentions the following as part of the state that needs to be copied to each worker: - The transaction snapshot. - The active snapshot, which might be different from the transaction snapshot. So they might be different, but exactly when and why? When I debugged a typical parallel-SELECT case, I found that prior to plan execution, GetTransactionSnapshot() was called and its return value was stored in both the QueryDesc and the estate (es_snapshot), which was then pushed on the ActiveSnapshot stack. So by the time InitializeParallelDSM() was called, the (top) ActiveSnapshot was the last snapshot returned from GetTransactionSnapshot(). So why InitializeParallelDSM() calls GetTransactionSnapshot() again is not clear to me (because isn't then the ActiveSnapshot a potentially earlier snapshot? - which it shouldn't be, AFAIK. And also, it's then different to the non-parallel case). >Before we know them exactly, I think we should not modify the > TransactionSnapshot to ActiveSnapshot in main process. If it is, the main > process should send ActiveSnapshot only. I think it would be worth you trying my suggested change (if you have a development environment, which I assume you have). Sure, IF it was deemed a proper solution, you'd only send the one snapshot, and adjust accordingly in ParallelWorkerMain(), but we need not worry about that in order to test it. Regards, Greg Nancarrow Fujitsu Australia
RE: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump
Hi Greg, > I actually think that the Assert in SubTransGetTopmostTransaction() is correct, but in the parallel-worker case, the snapshots are not being setup correctly. I agree with you that Assert in SubTransGetTopmostTransaction() is correct. The root cause is Transaction Xmin are not being setup correctly in the parallel-worker. Actually I am very confused about ActiveSnapshot and TransactionSnapshot. I don't know why main process send ActiveSnapshot and TransactionSnapshot separately. And what is exact difference between them? If you know that, could you explain that for me? It will be very appreciated. Before we know them exactly, I think we should not modify the TransactionSnapshot to ActiveSnapshot in main process. If it is, the main process should send ActiveSnapshot only. Thanks Pengcheng -Original Message- From: Greg Nancarrow Sent: 2021年5月17日 20:59 To: 刘鹏程 Cc: Andres Freund ; PostgreSQL-development Subject: Re: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump On Sat, May 15, 2021 at 12:37 PM 刘鹏程 wrote: > > > BTW, I test it in a high performance server. It is verly easily be > reproduced. My colleague and me use different environment both can reproduce > it. > Hi Pengcheng, Although the issue won't reproduce easily in my system, I can certainly see how, for the snapshots used in the parallel worker case, the Active snapshot used is potentially an earlier snapshot that the Transaction snapshot. I don't know why it is getting a newer Transaction snapshot in InitializeParallelDSM(), when it has previously pushed the return value of GetTransactionSnapshot() as the Active snapshot. So I too hope Tom or Robert can explain what is going on here and how to resolve it (as you requested them to, in your other post). I actually think that the Assert in SubTransGetTopmostTransaction() is correct, but in the parallel-worker case, the snapshots are not being setup correctly. Can you try the trivial change below and see if it prevents the coredump? Regards, Greg Nancarrow Fujitsu Australia diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c index 14a8690019..870889053f 100644 --- a/src/backend/access/transam/parallel.c +++ b/src/backend/access/transam/parallel.c @@ -216,7 +216,7 @@ InitializeParallelDSM(ParallelContext *pcxt) int i; FixedParallelState *fps; dsm_handle session_dsm_handle = DSM_HANDLE_INVALID; - Snapshot transaction_snapshot = GetTransactionSnapshot(); + Snapshot transaction_snapshot = GetActiveSnapshot(); Snapshot active_snapshot = GetActiveSnapshot(); /* We might be running in a very short-lived memory context. */
RE: Parallel scan with SubTransGetTopmostTransaction assert coredump
Hi Tom & Robert, Could you review this Assert(TransactionIdFollowsOrEquals(xid, TransactionXmin)) in SubTransGetTopmostTransaction. I think this assert is unsuitable for parallel work process. Before we discuss it in https://www.postgresql-archive.org/Parallel-scan-with-SubTransGetTopmostTransaction-assert-coredump-td6197408.html Thanks Pengcheng -Original Message- From: Greg Nancarrow Sent: 2021年5月15日 0:44 To: Pengchengliu Cc: Andres Freund ; PostgreSQL-development Subject: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump On Fri, May 14, 2021 at 8:36 PM Pengchengliu wrote: > Did you use pgbench with the script sub_120.sql which I provide in > attachment? yes > > Did you increase the number PGPROC_MAX_CACHED_SUBXIDS? Please don't change > any codes, now we just use the origin codes in PG13.2. > No, I have made no source code changes at all. That was my suggestion, for you to try - because if the problem is avoided by increasing PGPROC_MAX_CACHED_SUBXIDS (to say 128) then it probably indicates the overflow condition is affecting the xmin.xmax of the two snapshots such that it invalidates the condition that is asserted. I think one problem is that in your settings, you haven't set "max_worker_processes", yet have set "max_parallel_workers = 128". I'm finding no more than 8 parallel workers are actually active at any one time. On top of this, you've got pgbench running with 200 concurrent clients. So many queries are actually executing parallel plans without using parallel workers, as the workers can't actually be launched (and this is probably why I'm finding it hard to reproduce the issue, if the problem involves snapshot suboverflow and parallel workers). I find that the following settings improve the parallelism per query and the whole test runs very much faster: max_connections = 2000 parallel_setup_cost=0 parallel_tuple_cost=0 min_parallel_table_scan_size=0 max_parallel_workers_per_gather=4 max_parallel_workers = 100 max_worker_processes = 128 and adjust the pgbench command-line: pgbench -d postgres -p 33550 -n -r -f sub_120.sql -c 25 -j 25 -T 1800 Problem is, I still get no coredump when using this. Can you try these settings and let me know if the crash still happens if you use these settings? I also tried: max_connections = 2000 parallel_setup_cost=0 parallel_tuple_cost=0 min_parallel_table_scan_size=0 max_parallel_workers_per_gather=2 max_parallel_workers = 280 max_worker_processes = 300 and the pgbench command-line: pgbench -d postgres -p 33550 -n -r -f sub_120.sql -c 140 -j 140 -T 1800 - but I still get no coredump. Regards, Greg Nancarrow Fujitsu Australia
RE: Parallel scan with SubTransGetTopmostTransaction assert coredump
Hi Greg, When you get the different xmin between active snapshot and transaction snapshot, maybe there is no coredump. As maybe there is not tupe(xmin) between ActiveSnapshot->xmin and TransactionSnapshot->xmin which needs to be scaned in parallel process. There is no doubt, it is very likely that ActiveSnapshot->xmin precedes TransactionSnapshot->xmin. For this coredump, we must make sure parallel and snapshot overflow. If snapshot is not overflow, you cannot get the coredump. As coredump is from parallel scan in MVCC when snapshot is overflow. Did you use pgbench with the script sub_120.sql which I provide in attachment? As the default PGPROC_MAX_CACHED_SUBXIDS is 64. In script sub_120.sql, for one transaction, it will use 120 subtransactions which is much larger than 64. While getting the snapshot, it must be overflow. I really don't know why your snapshot is not overflow. Did you increase the number PGPROC_MAX_CACHED_SUBXIDS? Please don't change any codes, now we just use the origin codes in PG13.2. I have checked the codes in master branch, there is no change about this mechanism. This issue should still exist. Thanks Pengcheng -Original Message- From: Greg Nancarrow Sent: 2021年5月14日 16:47 To: Pengchengliu Cc: Andres Freund ; PostgreSQL-development Subject: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump On Fri, May 14, 2021 at 12:25 PM Pengchengliu < <mailto:pengcheng...@tju.edu.cn> pengcheng...@tju.edu.cn> wrote: > > Hi Greg, > > Thanks for you replay and test. > > > > When main process gets the transaction snapshot in > InitializeParallelDSM->GetTransactionSnapshot, the transaction snapshot xmin > is very likely follows active snapshot xmin. > > > > Use gdb it is easy to verify it. > > Create env as blow: > > > > 1, Use PG13.2(3fb4c75e857adee3da4386e947ba58a75f3e74b7), init a cluster > database. > > 2, Append the postgres.conf as below: > > > > max_connections = 2000 > > parallel_setup_cost=0 > > parallel_tuple_cost=0 > > min_parallel_table_scan_size=0 > > max_parallel_workers_per_gather=8 > > max_parallel_workers = 128 > > > > 3, Start the cluster database. Use the init_test.sql script in attachment > to create some test tables. > > 4, Use the sub_120.sql script in attachment with pgbench to test it. > > > > pgbench -d postgres -p 33550 -n -r -f sub_120.sql -c 200 -j 200 -T 1800 > > > > > > > > Then you can login the database, and use GDB to verify it. > > 1, First use explain, make sure force Parallel is OK. > > > > postgres=# explain (verbose,analyze) select count(*) from contend1; > > > QUERY PLAN > > > > -- > --- > > > > Finalize Aggregate (cost=12006.11..12006.12 rows=1 width=8) (actual > time=1075.214..1075.449 rows=1 loops=1) > >Output: count(*) > >-> Gather (cost=12006.08..12006.09 rows=8 width=8) (actual > time=1075.198..1075.433 rows=1 loops=1) > > Output: (PARTIAL count(*)) > > Workers Planned: 8 > > Workers Launched: 0 > > -> Partial Aggregate (cost=12006.08..12006.09 rows=1 > width=8) (actual time=1074.674..1074.676 rows=1 loops=1) > >Output: PARTIAL count(*) > >-> Parallel Seq Scan on public.contend1 > (cost=0.00..11690.06 rows=126406 width=0) (actual time=0.008..587.454 > rows=1 > > 010200 loops=1) > > Output: id, val, c2, c3, c4, c5, c6, c7, c8, c9, > c10, crt_time > > Planning Time: 0.123 ms > > Execution Time: 1075.588 ms > > postgres=# select pg_backend_pid(); > > pg_backend_pid > > > > 2182678 > > > > > > 2, use gdb to debug our backend process. Add the breakpoint in > parallel.c:219 and continue. > > > > gdb -q -p 2182678 > > ... > > (gdb) b parallel.c:219 > > Breakpoint 1 at 0x55d085: file > /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/transam/parallel.c, > line 219. > > (gdb) c > > Continuing. > > > > 3, In the psql clinet, we can execute the explain command in step1 aga
RE: Parallel scan with SubTransGetTopmostTransaction assert coredump
Hi Greg, Thanks for you replay and test. When main process gets the transaction snapshot in InitializeParallelDSM->GetTransactionSnapshot, the transaction snapshot xmin is very likely follows active snapshot xmin. Use gdb it is easy to verify it. Create env as blow: 1, Use PG13.2(3fb4c75e857adee3da4386e947ba58a75f3e74b7), init a cluster database. 2, Append the postgres.conf as below: max_connections = 2000 parallel_setup_cost=0 parallel_tuple_cost=0 min_parallel_table_scan_size=0 max_parallel_workers_per_gather=8 max_parallel_workers = 128 3, Start the cluster database. Use the init_test.sql script in attachment to create some test tables. 4, Use the sub_120.sql script in attachment with pgbench to test it. pgbench -d postgres -p 33550 -n -r -f sub_120.sql -c 200 -j 200 -T 1800 Then you can login the database, and use GDB to verify it. 1, First use explain, make sure force Parallel is OK. postgres=# explain (verbose,analyze) select count(*) from contend1; QUERY PLAN - Finalize Aggregate (cost=12006.11..12006.12 rows=1 width=8) (actual time=1075.214..1075.449 rows=1 loops=1) Output: count(*) -> Gather (cost=12006.08..12006.09 rows=8 width=8) (actual time=1075.198..1075.433 rows=1 loops=1) Output: (PARTIAL count(*)) Workers Planned: 8 Workers Launched: 0 -> Partial Aggregate (cost=12006.08..12006.09 rows=1 width=8) (actual time=1074.674..1074.676 rows=1 loops=1) Output: PARTIAL count(*) -> Parallel Seq Scan on public.contend1 (cost=0.00..11690.06 rows=126406 width=0) (actual time=0.008..587.454 rows=1 010200 loops=1) Output: id, val, c2, c3, c4, c5, c6, c7, c8, c9, c10, crt_time Planning Time: 0.123 ms Execution Time: 1075.588 ms postgres=# select pg_backend_pid(); pg_backend_pid 2182678 2, use gdb to debug our backend process. Add the breakpoint in parallel.c:219 and continue. gdb -q -p 2182678 ... (gdb) b parallel.c:219 Breakpoint 1 at 0x55d085: file /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/transam/parallel.c, line 219. (gdb) c Continuing. 3, In the psql clinet, we can execute the explain command in step1 again. After we get the breakpoint in gdb, we wait a moment. Then we execute next. Use gdb check active_snapshot and transaction_snapshot, active_snapshot->xmin is 158987 and transaction_snapshot->xmin is 162160. When I use gdb test it, sometimes active_snapshot is the same as transaction_snapshot. Then you can try it multiple times, and before execute next, try wait longer time. Breakpoint 1, InitializeParallelDSM (pcxt=0x2d53670) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/transam/parallel.c:219 219 Snapshottransaction_snapshot = GetTransactionSnapshot(); (gdb) n 220 Snapshotactive_snapshot = GetActiveSnapshot(); (gdb) 223 oldcontext = MemoryContextSwitchTo(TopTransactionContext); (gdb) p *transaction_snapshot $1 = {snapshot_type = SNAPSHOT_MVCC, xmin = 162160, xmax = 183011, xip = 0x2d50d10, xcnt = 179, subxip = 0x148a9cddf010, subxcnt = 0, suboverflowed = true, takenDuringRecovery = false, copied = false, curcid = 0, speculativeToken = 0, active_count = 0, regd_count = 0, ph_node = {first_child = 0x0, next_sibling = 0x0, prev_or_parent = 0x0}, whenTaken = 0, lsn = 0} (gdb) p *active_snapshot $2 = {snapshot_type = SNAPSHOT_MVCC, xmin = 158987, xmax = 173138, xip = 0x2d53288, xcnt = 178, subxip = 0x0, subxcnt = 0, suboverflowed = true, takenDuringRecovery = false, copied = true, curcid = 0, speculativeToken = 0, active_count = 1, regd_count = 2, ph_node = {first_child = 0x0, next_sibling = 0x0, prev_or_parent = 0x2d52e48}, whenTaken = 0, lsn = 0} (gdb) Thanks Pengcheng -Original Message- From: Greg Nancarrow Sent: 2021年5月13日 22:15 To: Pengchengliu Cc: Andres Freund ; PostgreSQL-development Subject: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump On Thu, May 13, 2021 at 11:25 AM Pengchengliu < <mailto:pengcheng...@tju.edu.cn> pengcheng...@tju.edu.cn> wrote: > > Hi Andres, > Thanks for you replay. Er, it's Greg who has replied so far (not Andres). > > And If you still cannot reproduce it in 2 minitus. Could you run pgbench > longer time, such as 30 or 60 minutes. > Actually, I d
RE: Parallel scan with SubTransGetTopmostTransaction assert coredump
nSnapshot->GetSnapshotData. 2, Acooding PARALLEL_KEY_TRANSACTION_SNAPSHOT(xmin 799425, xmax 82229) from main process, and set TransactionXmin 799425 in ParallelWorkerMain->RestoreTransactionSnapshot->SetTransactionSnapshot->ProcArrayInstallRestoredXmin. 3, ExecParallelInitializeWorker->ExecSeqScanInitializeWorker->table_beginscan_parallel get the active snapshot(xmin 799162, xmax 82206) from main process, and set this snapshot to scan->rs_base.rs_snapshot. 4, parallel scan begin, with active snapshot(xmin 799162, xmax 82206) and TransactionXmin(799425),when scan tuple(xmin 799225) SubTransGetTopmostTransaction assert got. In HeapTupleSatisfiesMVCC->XidInMVCCSnapshot->SubTransGetTopmostTransaction. As main process gets the active snapshot (xmin 799162, xmax 822061) which is earlier than transaction snapshot(xmin 799425, xmax 822293). Parallel work process set TransactionXmin with transaction snapshot(xmin 799425, xmax 822293). But scan tuple with active snapshot (xmin 799162, xmax 822061). Thanks Pengcheng -Original Message- From: Greg Nancarrow Sent: 2021年5月11日 19:08 To: Pengchengliu Cc: Andres Freund ; PostgreSQL-development Subject: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump On Tue, May 11, 2021 at 11:28 AM Pengchengliu < <mailto:pengcheng...@tju.edu.cn> pengcheng...@tju.edu.cn> wrote: > > Hi Andres, >Reproduce steps. > > 1, Modify and adjust NUM_SUBTRANS_BUFFERS to 128 from 32 in the file > "src/include/access/subtrans.h" line number 15. > 2, configure with enable assert and build it. > 3, init a new database cluster. > 4, modify postgres.conf and add some parameters as below. As the coredump > from parallel scan, so we adjust parallel setting, make it easy to reproduce. > > max_connections = 2000 > > parallel_setup_cost=0 > parallel_tuple_cost=0 > min_parallel_table_scan_size=0 > max_parallel_workers_per_gather=8 > max_parallel_workers = 32 > > 5, start the database cluster. > 6, use the script init_test.sql in attachment to create tables. > 7, use pgbench with script sub_120.sql in attachment to test it. Try it > sometimes, you should get the coredump file. > pgbench -d postgres -p 33550 -n -r -f sub_120.sql -c 200 -j 200 -T 120 > Hi, I had a go at reproducing your reported issue, making sure to follow all your steps. Unfortunately, your script seemed to run OK with pgbench and no crash/coredump occurred for me (and yes, I definitely had asserts enabled). I tried with both the 13.2 source code (3fb4c75e857adee3da4386e947ba58a75f3e74b7), running through the script with pgbench twice to completion, and also did the same using the latest Postgres source code. Will be interesting to see if anyone is able to reproduce your issue. Regards, Greg Nancarrow Fujitsu Australia
RE: Parallel scan with SubTransGetTopmostTransaction assert coredump
Hi Andres, Reproduce steps. 1, Modify and adjust NUM_SUBTRANS_BUFFERS to 128 from 32 in the file "src/include/access/subtrans.h" line number 15. 2, configure with enable assert and build it. 3, init a new database cluster. 4, modify postgres.conf and add some parameters as below. As the coredump from parallel scan, so we adjust parallel setting, make it easy to reproduce. max_connections = 2000 parallel_setup_cost=0 parallel_tuple_cost=0 min_parallel_table_scan_size=0 max_parallel_workers_per_gather=8 max_parallel_workers = 32 5, start the database cluster. 6, use the script init_test.sql in attachment to create tables. 7, use pgbench with script sub_120.sql in attachment to test it. Try it sometimes, you should get the coredump file. pgbench -d postgres -p 33550 -n -r -f sub_120.sql -c 200 -j 200 -T 120 Thanks Pengcheng -Original Message- From: Andres Freund Sent: 2021年5月7日 11:55 To: Pengchengliu Cc: pgsql-hack...@postgresql.org Subject: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump Hi, On 2021-05-07 11:32:57 +0800, Pengchengliu wrote: > Hi Hackers, > > Last email, format error, missing some information, so I resend this email. > > With PG 13.2(3fb4c75e857adee3da4386e947ba58a75f3e74b7), I tested > subtransaction with parallel scan, I got a subtransaction coredump as below: > So the root cause is the Parallel Workers process set the TransactionXmin > with later transcation snapshot. When parallel scan, Parallel Workers process > use the older active snapshot. > > It leads to subtrans assert coredump. I don't know how to fix it. Is there > any ideas? Do you have steps to reliably reproduce this? Greetings, Andres Freund test_script.tar Description: Unix tar archive
RE: Parallel scan with SubTransGetTopmostTransaction assert coredump
Hi Andres, Reproduce steps. 1, Modify and adjust NUM_SUBTRANS_BUFFERS to 128 from 32 in the file "src/include/access/subtrans.h" line number 15. 2, configure with enable assert and build it. 3, init a new database cluster. 4, modify postgres.conf and add some parameters as below. As the coredump from parallel scan, so we adjust parallel setting, make it easy to reproduce. max_connections = 2000 parallel_setup_cost=0 parallel_tuple_cost=0 min_parallel_table_scan_size=0 max_parallel_workers_per_gather=8 max_parallel_workers = 32 5, start the database cluster. 6, use the script init_test.sql in attachment to create tables. 7, use pgbench with script sub_120.sql in attachment to test it. Try it sometimes, you should get the coredump file. pgbench -d postgres -p 33550 -n -r -f sub_120.sql -c 200 -j 200 -T 120 Thanks Pengcheng -Original Message- From: Andres Freund Sent: 2021年5月7日 11:55 To: Pengchengliu Cc: pgsql-hack...@postgresql.org Subject: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump Hi, On 2021-05-07 11:32:57 +0800, Pengchengliu wrote: > Hi Hackers, > > Last email, format error, missing some information, so I resend this email. > > With PG 13.2(3fb4c75e857adee3da4386e947ba58a75f3e74b7), I tested > subtransaction with parallel scan, I got a subtransaction coredump as below: > So the root cause is the Parallel Workers process set the TransactionXmin > with later transcation snapshot. When parallel scan, Parallel Workers process > use the older active snapshot. > > It leads to subtrans assert coredump. I don't know how to fix it. Is there > any ideas? Do you have steps to reliably reproduce this? Greetings, Andres Freund
Parallel scan with SubTransGetTopmostTransaction assert coredump
Hi Hackers, Last email, format error, missing some information, so I resend this email. With PG 13.2(3fb4c75e857adee3da4386e947ba58a75f3e74b7), I tested subtransaction with parallel scan, I got a subtransaction coredump as below: ``` (gdb) bt #0 0x1517ce61f7ff in raise () from /lib64/libc.so.6 #1 0x1517ce609c35 in abort () from /lib64/libc.so.6 #2 0x00aaf93f in ExceptionalCondition (conditionName=0xb4c920 "TransactionIdFollowsOrEquals(xid, TransactionXmin)", errorType=0xb4c796 "FailedAssertion", fileName=0xb4c738 "/home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/transam/subtrans.c", lineNumber=156) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/utils/error/assert.c:67 #3 0x00563111 in SubTransGetTopmostTransaction (xid=196963) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/transam/subtrans.c:156 #4 0x00b05206 in XidInMVCCSnapshot (xid=196963, snapshot=0x2f8ec58) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/utils/time/snapmgr.c:2293 #5 0x004ff2bc in HeapTupleSatisfiesMVCC (htup=0x7ffc21807120, snapshot=0x2f8ec58, buffer=1946) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/heap/heapam_visibility.c:1073 #6 0x00500363 in HeapTupleSatisfiesVisibility (tup=0x7ffc21807120, snapshot=0x2f8ec58, buffer=1946) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/heap/heapam_visibility.c:1695 #7 0x004e423b in heapgetpage (sscan=0x2f8e840, page=1685) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/heap/heapam.c:447 #8 0x004e68bf in heapgettup_pagemode (scan=0x2f8e840, dir=ForwardScanDirection, nkeys=0, key=0x0) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/heap/heapam.c:1077 #9 0x004e6eb6 in heap_getnextslot (sscan=0x2f8e840, direction=ForwardScanDirection, slot=0x2fd8a38) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/heap/heapam.c:1333 #10 0x0075350e in table_scan_getnextslot (sscan=0x2f8e840, direction=ForwardScanDirection, slot=0x2fd8a38) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/include/access/tableam.h:906 #11 0x007535d6 in SeqNext (node=0x2fd86a8) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor/nodeSeqscan.c:80 #12 0x0071af3c in ExecScanFetch (node=0x2fd86a8, accessMtd=0x753542 , recheckMtd=0x7535e7 ) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor/execScan.c:133 #13 0x0071afdd in ExecScan (node=0x2fd86a8, accessMtd=0x753542 , recheckMtd=0x7535e7 ) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor/execScan.c:199 #14 0x00753631 in ExecSeqScan (pstate=0x2fd86a8) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor/nodeSeqscan.c:112 #15 0x007173af in ExecProcNodeFirst (node=0x2fd86a8) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor/execProcnode.c:450 #16 0x0070b7f7 in ExecProcNode (node=0x2fd86a8) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/include/executor/executor.h:248 #17 0x0070e302 in ExecutePlan (estate=0x2fd7ca0, planstate=0x2fd86a8, use_parallel_mode=false, operation=CMD_SELECT, sendTuples=true, numberTuples=0, direction=ForwardScanDirection, dest=0x2f8ddd8, execute_once=true) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor/execMain.c:1632 #18 0x0070be22 in standard_ExecutorRun (queryDesc=0x2f8e7a8, direction=ForwardScanDirection, count=0, execute_once=true) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor/execMain.c:350 #19 0x0070bc50 in ExecutorRun (queryDesc=0x2f8e7a8, direction=ForwardScanDirection, count=0, execute_once=true) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor/execMain.c:294 #20 0x007131d5 in ParallelQueryMain (seg=0x2ef30b8, toc=0x1517cf85c000) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor/execParallel.c:1448 #21 0x0055f70c in ParallelWorkerMain (main_arg=897455922) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/transam/parallel.c:1470 #22 0x0086e255 in StartBackgroundWorker () at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/postmaster/bgworker.c:879 #23 0x0088192c in do_start_bgworker (rw=0x2f221c0) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/postmaster/postmaster.c:5870 #24 0x00881cd8 in maybe_start_bgworkers () at
PG 13.2 Parallel scan with SubTransGetTopmostTransaction assert coredump
Hi hackers, With PG 13.2(3fb4c75e857adee3da4386e947ba58a75f3e74b7), I tested subtransaction with parallel scan, I got a subtransaction coredump as below: (gdb) bt #0 0x1517ce61f7ff in raise () from /lib64/libc.so.6 #1 0x1517ce609c35 in abort () from /lib64/libc.so.6 #2 0x00aaf93f in ExceptionalCondition (conditionName=0xb4c920 "TransactionIdFollowsOrEquals(xid, TransactionXmin)", errorType=0xb4c796 "FailedAssertion", fileName=0xb4c738 "/home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/ transam/subtrans.c", lineNumber=156) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/utils/er ror/assert.c:67 #3 0x00563111 in SubTransGetTopmostTransaction (xid=196963) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/t ransam/subtrans.c:156 #4 0x00b05206 in XidInMVCCSnapshot (xid=196963, snapshot=0x2f8ec58) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/utils/ti me/snapmgr.c:2293 #5 0x004ff2bc in HeapTupleSatisfiesMVCC (htup=0x7ffc21807120, snapshot=0x2f8ec58, buffer=1946) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/h eap/heapam_visibility.c:1073 #6 0x00500363 in HeapTupleSatisfiesVisibility (tup=0x7ffc21807120, snapshot=0x2f8ec58, buffer=1946) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/h eap/heapam_visibility.c:1695 #7 0x004e423b in heapgetpage (sscan=0x2f8e840, page=1685) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/h eap/heapam.c:447 #8 0x004e68bf in heapgettup_pagemode (scan=0x2f8e840, dir=ForwardScanDirection, nkeys=0, key=0x0) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/h eap/heapam.c:1077 #9 0x004e6eb6 in heap_getnextslot (sscan=0x2f8e840, direction=ForwardScanDirection, slot=0x2fd8a38) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/h eap/heapam.c:1333 #10 0x0075350e in table_scan_getnextslot (sscan=0x2f8e840, direction=ForwardScanDirection, slot=0x2fd8a38) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/include/access/t ableam.h:906 #11 0x007535d6 in SeqNext (node=0x2fd86a8) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor /nodeSeqscan.c:80 #12 0x0071af3c in ExecScanFetch (node=0x2fd86a8, accessMtd=0x753542 , recheckMtd=0x7535e7 ) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor /execScan.c:133 #13 0x0071afdd in ExecScan (node=0x2fd86a8, accessMtd=0x753542 , recheckMtd=0x7535e7 ) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor /execScan.c:199 #14 0x00753631 in ExecSeqScan (pstate=0x2fd86a8) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor /nodeSeqscan.c:112 #15 0x007173af in ExecProcNodeFirst (node=0x2fd86a8) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor /execProcnode.c:450 #16 0x0070b7f7 in ExecProcNode (node=0x2fd86a8) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/include/executor /executor.h:248 #17 0x0070e302 in ExecutePlan (estate=0x2fd7ca0, planstate=0x2fd86a8, use_parallel_mode=false, operation=CMD_SELECT, sendTuples=true, numberTuples=0, direction=ForwardScanDirection, dest=0x2f8ddd8, execute_once=true) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor /execMain.c:1632 #18 0x0070be22 in standard_ExecutorRun (queryDesc=0x2f8e7a8, direction=ForwardScanDirection, count=0, execute_once=true) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor /execMain.c:350 #19 0x0070bc50 in ExecutorRun (queryDesc=0x2f8e7a8, direction=ForwardScanDirection, count=0, execute_once=true) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor /execMain.c:294 #20 0x007131d5 in ParallelQueryMain (seg=0x2ef30b8, toc=0x1517cf85c000) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/executor /execParallel.c:1448 #21 0x0055f70c in ParallelWorkerMain (main_arg=897455922) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/access/t ransam/parallel.c:1470 #22 0x0086e255 in StartBackgroundWorker () at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/postmast er/bgworker.c:879 #23 0x0088192c in do_start_bgworker (rw=0x2f221c0) at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/postmast er/postmaster.c:5870 #24 0x00881cd8 in maybe_start_bgworkers () at /home/liupc/build/build_postgres2/../../devel/postgres2/src/backend/postmast er/postmaster.c:6095 #25 0x00880d14 in sigusr1_handler (postgres_signal_arg=10) at