On Wed, Jul 10, 2024 at 10:39 PM vignesh C <vignes...@gmail.com> wrote: > > On Wed, 10 Jul 2024 at 12:28, Amit Kapila <amit.kapil...@gmail.com> wrote: > > The patch missed to use the ShareRowExclusiveLock for partitions, see > > attached. I haven't tested it but they should also face the same > > problem. Apart from that, I have changed the comments in a few places > > in the patch. > > I could not hit the updated ShareRowExclusiveLock changes through the > partition table, instead I could verify it using the inheritance > table. Added a test for the same and also attaching the backbranch > patch. >
Hi, I tested alternative-experimental-fix-lock.patch provided by Tomas (replaces SUE with SRE in OpenTableList). I believe there are a couple of scenarios the patch does not cover. 1. It doesn't handle the case of "ALTER PUBLICATION <pub> ADD TABLES IN SCHEMA <schema>". I took crash-test.sh provided by Tomas and modified it to add all tables in the schema to publication using the following command : ALTER PUBLICATION p ADD TABLES IN SCHEMA public The modified script is attached (crash-test-with-schema.sh). With this script, I can reproduce the issue even with the patch applied. This is because the code path to add a schema to the publication doesn't go through OpenTableList. I have also attached a script run-test-with-schema.sh to run crash-test-with-schema.sh in a loop with randomly generated parameters (modified from run.sh provided by Tomas). 2. The second issue is a deadlock which happens when the alter publication command is run for a comma separated list of tables. I created another script create-test-tables-order-reverse.sh. This script runs a command like the following : ALTER PUBLICATION p ADD TABLE test_2,test_1 Running the above script, I was able to get a deadlock error (the output is attached in deadlock.txt). In the alter publication command, I added the tables in the reverse order to increase the probability of the deadlock. But it should happen with any order of tables. I am not sure if the deadlock is a major issue because detecting the deadlock is better than data loss. The schema issue is probably more important. I didn't test it out with the latest patches sent by Vignesh but since the code changes in that patch are also in OpenTableList, I think the schema scenario won't be covered by those. Thanks & Regards, Nitin Motiani Google
nitinmotiani@nitinmotiani:~/vanila_postgres/postgresql$ ./crash-test-tables-order-reverse.sh test-data-dir-reverse-proper 2 2 5 /usr/local/pgsql/bin:/usr/local/pgsql/bin:/usr/local/pgsql/bin:/usr/lib/google-golang/bin:/usr/local/buildtools/java/jdk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin [Wed Jul 10 07:50:10 AM UTC 2024] [1720597810] NUMTABLES=2 SLEEP=5 REFRESH=2 The files belonging to this database system will be owned by user "nitinmotiani". This user must also own the server process. The database cluster will be initialized with locale "en_US.UTF-8". The default database encoding has accordingly been set to "UTF8". The default text search configuration will be set to "english". Data page checksums are disabled. creating directory test-data-dir-reverse-proper/data-pub ... ok creating subdirectories ... ok selecting dynamic shared memory implementation ... posix selecting default "max_connections" ... 100 selecting default "shared_buffers" ... 128MB selecting default time zone ... Etc/UTC creating configuration files ... ok running bootstrap script ... ok performing post-bootstrap initialization ... ok syncing data to disk ... ok initdb: warning: enabling "trust" authentication for local connections initdb: hint: You can change this by editing pg_hba.conf or using the option -A, or --auth-local and --auth-host, the next time you run initdb. Success. You can now start the database server using: /usr/local/pgsql/bin/pg_ctl -D test-data-dir-reverse-proper/data-pub -l logfile start The files belonging to this database system will be owned by user "nitinmotiani". This user must also own the server process. The database cluster will be initialized with locale "en_US.UTF-8". The default database encoding has accordingly been set to "UTF8". The default text search configuration will be set to "english". Data page checksums are disabled. creating directory test-data-dir-reverse-proper/data-sub ... ok creating subdirectories ... ok selecting dynamic shared memory implementation ... posix selecting default "max_connections" ... 100 selecting default "shared_buffers" ... 128MB selecting default time zone ... Etc/UTC creating configuration files ... ok running bootstrap script ... ok performing post-bootstrap initialization ... ok syncing data to disk ... ok initdb: warning: enabling "trust" authentication for local connections initdb: hint: You can change this by editing pg_hba.conf or using the option -A, or --auth-local and --auth-host, the next time you run initdb. Success. You can now start the database server using: /usr/local/pgsql/bin/pg_ctl -D test-data-dir-reverse-proper/data-sub -l logfile start waiting for server to start.... done server started waiting for server to start.... done server started CREATE TABLE CREATE TABLE CREATE TABLE CREATE TABLE [Wed Jul 10 07:50:12 AM UTC 2024] [1720597812] creating publication CREATE PUBLICATION [Wed Jul 10 07:50:12 AM UTC 2024] [1720597812] sleeping [Wed Jul 10 07:50:17 AM UTC 2024] [1720597817] creating subscription NOTICE: created replication slot "s" on publisher CREATE SUBSCRIPTION [Wed Jul 10 07:50:17 AM UTC 2024] [1720597817] sleeping Printing bulk alter command ALTER PUBLICATION p ADD TABLE test_2,test_1 ERROR: deadlock detected DETAIL: Process 524333 waits for ShareRowExclusiveLock on relation 16386 of database 16384; blocked by process 524298. Process 524298 waits for RowExclusiveLock on relation 16395 of database 16384; blocked by process 524333. HINT: See server log for query details. [Wed Jul 10 07:50:23 AM UTC 2024] [1720597823] adding table 2 to publication [0/3669BC0] [Wed Jul 10 07:50:23 AM UTC 2024] [1720597823] refreshing publication [0/366F9D8] ALTER SUBSCRIPTION [Wed Jul 10 07:50:23 AM UTC 2024] [1720597823] adding table 1 to publication [0/36826A8] [Wed Jul 10 07:50:23 AM UTC 2024] [1720597823] refreshing publication [0/3688A98] ALTER SUBSCRIPTION [Wed Jul 10 07:50:23 AM UTC 2024] [1720597823] waiting for subscription [Wed Jul 10 07:50:23 AM UTC 2024] [1720597823] synced 0 tables out of 2 [Wed Jul 10 07:50:24 AM UTC 2024] [1720597824] waiting for subscription [Wed Jul 10 07:50:24 AM UTC 2024] [1720597824] synced 0 tables out of 2 [Wed Jul 10 07:50:25 AM UTC 2024] [1720597825] waiting for subscription [Wed Jul 10 07:50:25 AM UTC 2024] [1720597825] synced 0 tables out of 2 [Wed Jul 10 07:50:26 AM UTC 2024] [1720597826] waiting for subscription [Wed Jul 10 07:50:26 AM UTC 2024] [1720597826] synced 0 tables out of 2 [Wed Jul 10 07:50:27 AM UTC 2024] [1720597827] waiting for subscription [Wed Jul 10 07:50:27 AM UTC 2024] [1720597827] synced 0 tables out of 2 [Wed Jul 10 07:50:28 AM UTC 2024] [1720597828] waiting for subscription [Wed Jul 10 07:50:28 AM UTC 2024] [1720597828] synced 0 tables out of 2 [Wed Jul 10 07:50:29 AM UTC 2024] [1720597829] waiting for subscription [Wed Jul 10 07:50:29 AM UTC 2024] [1720597829] synced 0 tables out of 2 [Wed Jul 10 07:50:30 AM UTC 2024] [1720597830] waiting for subscription [Wed Jul 10 07:50:30 AM UTC 2024] [1720597830] synced 0 tables out of 2 [Wed Jul 10 07:50:31 AM UTC 2024] [1720597831] waiting for subscription [Wed Jul 10 07:50:31 AM UTC 2024] [1720597831] synced 0 tables out of 2 [Wed Jul 10 07:50:32 AM UTC 2024] [1720597832] waiting for subscription [Wed Jul 10 07:50:32 AM UTC 2024] [1720597832] synced 0 tables out of 2 [Wed Jul 10 07:50:33 AM UTC 2024] [1720597833] waiting for subscription [Wed Jul 10 07:50:33 AM UTC 2024] [1720597833] synced 0 tables out of 2 [Wed Jul 10 07:50:34 AM UTC 2024] [1720597834] waiting for subscription [Wed Jul 10 07:50:34 AM UTC 2024] [1720597834] synced 0 tables out of 2 [Wed Jul 10 07:50:35 AM UTC 2024] [1720597835] waiting for subscription [Wed Jul 10 07:50:35 AM UTC 2024] [1720597835] synced 0 tables out of 2 [Wed Jul 10 07:50:36 AM UTC 2024] [1720597836] waiting for subscription [Wed Jul 10 07:50:36 AM UTC 2024] [1720597836] synced 0 tables out of 2 [Wed Jul 10 07:50:37 AM UTC 2024] [1720597837] waiting for subscription [Wed Jul 10 07:50:37 AM UTC 2024] [1720597837] synced 0 tables out of 2 [Wed Jul 10 07:50:38 AM UTC 2024] [1720597838] waiting for subscription [Wed Jul 10 07:50:38 AM UTC 2024] [1720597838] synced 0 tables out of 2 [Wed Jul 10 07:50:39 AM UTC 2024] [1720597839] waiting for subscription [Wed Jul 10 07:50:39 AM UTC 2024] [1720597839] synced 0 tables out of 2 [Wed Jul 10 07:50:40 AM UTC 2024] [1720597840] waiting for subscription [Wed Jul 10 07:50:40 AM UTC 2024] [1720597840] synced 0 tables out of 2 [Wed Jul 10 07:50:41 AM UTC 2024] [1720597841] waiting for subscription [Wed Jul 10 07:50:41 AM UTC 2024] [1720597841] synced 0 tables out of 2 [Wed Jul 10 07:50:42 AM UTC 2024] [1720597842] waiting for subscription [Wed Jul 10 07:50:42 AM UTC 2024] [1720597842] synced 0 tables out of 2
crash-test-with-schema.sh
Description: Bourne shell script
run-test-with-schema.sh
Description: Bourne shell script
crash-test-tables-order-reverse.sh
Description: Bourne shell script