On Thu, Dec 14, 2023 at 4:36 PM Dilip Kumar <dilipbal...@gmail.com> wrote: > > On Wed, Dec 13, 2023 at 5:49 PM Andrey M. Borodin <x4...@yandex-team.ru> > wrote: > > > > On 12 Dec 2023, at 18:28, Alvaro Herrera <alvhe...@alvh.no-ip.org> wrote: > > > > > > Andrey, do you have any stress tests or anything else that you used to > > > gain confidence in this code? > >
I have done some more testing for the clog group update as the attached test file executes two concurrent scripts executed with pgbench, the first script is the slow script which will run 10-second long transactions and the second script is a very fast transaction with ~10000 transactions per second. Along with that, I have also changed the bank size such that each bank will contain just 1 page i.e. 32k transactions per bank. I have done this way so that we do not need to keep long-running transactions running for very long in order to get the transactions from different banks committed during the same time. With this test, I have got that behavior and the below logs shows that multiple transaction range which is in different slru-bank (considering 32k transactions per bank) are doing group update at the same time. e.g. in the below logs, we can see xid range around 70600, 70548, and 70558, and xid range around 755, and 752 are getting group updates by different leaders but near the same time. It is running fine when running for a long duration, but I am not sure how to validate the sanity of this kind of test. 2023-12-14 14:43:31.813 GMT [3306] LOG: group leader procno 606 updated status of procno 606 xid 70600 2023-12-14 14:43:31.816 GMT [3326] LOG: procno 586 for xid 70548 added for group update 2023-12-14 14:43:31.816 GMT [3326] LOG: procno 586 is group leader and got the lock 2023-12-14 14:43:31.816 GMT [3326] LOG: group leader procno 586 updated status of procno 586 xid 70548 2023-12-14 14:43:31.818 GMT [3327] LOG: procno 585 for xid 70558 added for group update 2023-12-14 14:43:31.818 GMT [3327] LOG: procno 585 is group leader and got the lock 2023-12-14 14:43:31.818 GMT [3327] LOG: group leader procno 585 updated status of procno 585 xid 70558 2023-12-14 14:43:31.829 GMT [3155] LOG: procno 687 for xid 752 added for group update 2023-12-14 14:43:31.829 GMT [3207] LOG: procno 669 for xid 755 added for group update 2023-12-14 14:43:31.829 GMT [3155] LOG: procno 687 is group leader and got the lock 2023-12-14 14:43:31.829 GMT [3155] LOG: group leader procno 687 updated status of procno 669 xid 755 2023-12-14 14:43:31.829 GMT [3155] LOG: group leader procno 687 updated status of procno 687 xid 752 -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
# Goal of this script to generate scenario where some old long running slow # transaction get committed with the new transactions such that they falls in # different slru banks rm -rf pgdata ./initdb -D pgdata echo "max_wal_size=20GB" >> pgdata/postgresql.conf echo "shared_buffers=20GB" >> pgdata/postgresql.conf echo "checkpoint_timeout=40min" >> pgdata/postgresql.conf echo "max_connections=700" >> pgdata/postgresql.conf echo "maintenance_work_mem=1GB" >> pgdata/postgresql.conf echo "subtrans_buffers=64" >> pgdata/postgresql.conf echo "multixact_members_buffers=128" >> pgdata/postgresql.conf #create slow_txn.sql script cat > slow_txn.sql << EOF BEGIN; INSERT INTO test VALUES(1); DELETE FROM test WHERE a=1; select pg_sleep(10); COMMIT; EOF #create fast_txn.sql script cat > fast_txn.sql << EOF BEGIN; INSERT INTO test1 VALUES(1); DELETE FROM test1 WHERE a=1; COMMIT; EOF ./pg_ctl -D pgdata -l logfile -c start ./psql -d postgres -c "create table test(a int)" ./psql -d postgres -c "create table test1(a int)" ./pgbench -i -s 1 postgres ./pgbench -f slow_txn.sql -c 28 -j 28 -P 1 -T 60 postgres & ./pgbench -f fast_txn.sql -c 100 -j 100 -P 1 -T 60 postgres sleep(10); ./pg_ctl -D pgdata -l logfile -c stop