On Wed, Jul 10, 2024 at 10:39 PM vignesh C <vignes...@gmail.com> wrote:
>
> On Wed, 10 Jul 2024 at 12:28, Amit Kapila <amit.kapil...@gmail.com> wrote:
> > The patch missed to use the ShareRowExclusiveLock for partitions, see
> > attached. I haven't tested it but they should also face the same
> > problem. Apart from that, I have changed the comments in a few places
> > in the patch.
>
> I could not hit the updated ShareRowExclusiveLock changes through the
> partition table, instead I could verify it using the inheritance
> table. Added a test for the same and also attaching the backbranch
> patch.
>

Hi,

I tested alternative-experimental-fix-lock.patch provided by Tomas
(replaces SUE with SRE in OpenTableList). I believe there are a couple
of scenarios the patch does not cover.

1. It doesn't handle the case of "ALTER PUBLICATION <pub> ADD TABLES
IN SCHEMA  <schema>".

I took crash-test.sh provided by Tomas and modified it to add all
tables in the schema to publication using the following command :

           ALTER PUBLICATION p ADD TABLES IN SCHEMA  public

The modified script is attached (crash-test-with-schema.sh). With this
script, I can reproduce the issue even with the patch applied. This is
because the code path to add a schema to the publication doesn't go
through OpenTableList.

I have also attached a script run-test-with-schema.sh to run
crash-test-with-schema.sh in a loop with randomly generated parameters
(modified from run.sh provided by Tomas).

2.  The second issue is a deadlock which happens when the alter
publication command is run for a comma separated list of tables.

I created another script create-test-tables-order-reverse.sh. This
script runs a command like the following :

            ALTER PUBLICATION p ADD TABLE test_2,test_1

Running the above script, I was able to get a deadlock error (the
output is attached in deadlock.txt). In the alter publication command,
I added the tables in the reverse order to increase the probability of
the deadlock. But it should happen with any order of tables.

I am not sure if the deadlock is a major issue because detecting the
deadlock is better than data loss. The schema issue is probably more
important. I didn't test it out with the latest patches sent by
Vignesh but since the code changes in that patch are also in
OpenTableList, I think the schema scenario won't be covered by those.

Thanks & Regards,
Nitin Motiani
Google
nitinmotiani@nitinmotiani:~/vanila_postgres/postgresql$ 
./crash-test-tables-order-reverse.sh test-data-dir-reverse-proper 2 2 5
/usr/local/pgsql/bin:/usr/local/pgsql/bin:/usr/local/pgsql/bin:/usr/lib/google-golang/bin:/usr/local/buildtools/java/jdk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
[Wed Jul 10 07:50:10 AM UTC 2024] [1720597810] NUMTABLES=2  SLEEP=5  REFRESH=2
The files belonging to this database system will be owned by user 
"nitinmotiani".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

creating directory test-data-dir-reverse-proper/data-pub ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default "max_connections" ... 100
selecting default "shared_buffers" ... 128MB
selecting default time zone ... Etc/UTC
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

initdb: warning: enabling "trust" authentication for local connections
initdb: hint: You can change this by editing pg_hba.conf or using the option 
-A, or --auth-local and --auth-host, the next time you run initdb.

Success. You can now start the database server using:

    /usr/local/pgsql/bin/pg_ctl -D test-data-dir-reverse-proper/data-pub -l 
logfile start

The files belonging to this database system will be owned by user 
"nitinmotiani".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

creating directory test-data-dir-reverse-proper/data-sub ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default "max_connections" ... 100
selecting default "shared_buffers" ... 128MB
selecting default time zone ... Etc/UTC
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

initdb: warning: enabling "trust" authentication for local connections
initdb: hint: You can change this by editing pg_hba.conf or using the option 
-A, or --auth-local and --auth-host, the next time you run initdb.

Success. You can now start the database server using:

    /usr/local/pgsql/bin/pg_ctl -D test-data-dir-reverse-proper/data-sub -l 
logfile start

waiting for server to start.... done
server started
waiting for server to start.... done
server started
CREATE TABLE
CREATE TABLE
CREATE TABLE
CREATE TABLE
[Wed Jul 10 07:50:12 AM UTC 2024] [1720597812] creating publication
CREATE PUBLICATION
[Wed Jul 10 07:50:12 AM UTC 2024] [1720597812] sleeping
[Wed Jul 10 07:50:17 AM UTC 2024] [1720597817] creating subscription
NOTICE:  created replication slot "s" on publisher
CREATE SUBSCRIPTION
[Wed Jul 10 07:50:17 AM UTC 2024] [1720597817] sleeping
Printing bulk alter command
ALTER PUBLICATION p ADD TABLE test_2,test_1
ERROR:  deadlock detected
DETAIL:  Process 524333 waits for ShareRowExclusiveLock on relation 16386 of 
database 16384; blocked by process 524298.
Process 524298 waits for RowExclusiveLock on relation 16395 of database 16384; 
blocked by process 524333.
HINT:  See server log for query details.
[Wed Jul 10 07:50:23 AM UTC 2024] [1720597823] adding table 2 to publication 
[0/3669BC0]
[Wed Jul 10 07:50:23 AM UTC 2024] [1720597823] refreshing publication 
[0/366F9D8]
ALTER SUBSCRIPTION
[Wed Jul 10 07:50:23 AM UTC 2024] [1720597823] adding table 1 to publication 
[0/36826A8]
[Wed Jul 10 07:50:23 AM UTC 2024] [1720597823] refreshing publication 
[0/3688A98]
ALTER SUBSCRIPTION
[Wed Jul 10 07:50:23 AM UTC 2024] [1720597823] waiting for subscription
[Wed Jul 10 07:50:23 AM UTC 2024] [1720597823] synced 0 tables out of 2
[Wed Jul 10 07:50:24 AM UTC 2024] [1720597824] waiting for subscription
[Wed Jul 10 07:50:24 AM UTC 2024] [1720597824] synced 0 tables out of 2
[Wed Jul 10 07:50:25 AM UTC 2024] [1720597825] waiting for subscription
[Wed Jul 10 07:50:25 AM UTC 2024] [1720597825] synced 0 tables out of 2
[Wed Jul 10 07:50:26 AM UTC 2024] [1720597826] waiting for subscription
[Wed Jul 10 07:50:26 AM UTC 2024] [1720597826] synced 0 tables out of 2
[Wed Jul 10 07:50:27 AM UTC 2024] [1720597827] waiting for subscription
[Wed Jul 10 07:50:27 AM UTC 2024] [1720597827] synced 0 tables out of 2
[Wed Jul 10 07:50:28 AM UTC 2024] [1720597828] waiting for subscription
[Wed Jul 10 07:50:28 AM UTC 2024] [1720597828] synced 0 tables out of 2
[Wed Jul 10 07:50:29 AM UTC 2024] [1720597829] waiting for subscription
[Wed Jul 10 07:50:29 AM UTC 2024] [1720597829] synced 0 tables out of 2
[Wed Jul 10 07:50:30 AM UTC 2024] [1720597830] waiting for subscription
[Wed Jul 10 07:50:30 AM UTC 2024] [1720597830] synced 0 tables out of 2
[Wed Jul 10 07:50:31 AM UTC 2024] [1720597831] waiting for subscription
[Wed Jul 10 07:50:31 AM UTC 2024] [1720597831] synced 0 tables out of 2
[Wed Jul 10 07:50:32 AM UTC 2024] [1720597832] waiting for subscription
[Wed Jul 10 07:50:32 AM UTC 2024] [1720597832] synced 0 tables out of 2
[Wed Jul 10 07:50:33 AM UTC 2024] [1720597833] waiting for subscription
[Wed Jul 10 07:50:33 AM UTC 2024] [1720597833] synced 0 tables out of 2
[Wed Jul 10 07:50:34 AM UTC 2024] [1720597834] waiting for subscription
[Wed Jul 10 07:50:34 AM UTC 2024] [1720597834] synced 0 tables out of 2
[Wed Jul 10 07:50:35 AM UTC 2024] [1720597835] waiting for subscription
[Wed Jul 10 07:50:35 AM UTC 2024] [1720597835] synced 0 tables out of 2
[Wed Jul 10 07:50:36 AM UTC 2024] [1720597836] waiting for subscription
[Wed Jul 10 07:50:36 AM UTC 2024] [1720597836] synced 0 tables out of 2
[Wed Jul 10 07:50:37 AM UTC 2024] [1720597837] waiting for subscription
[Wed Jul 10 07:50:37 AM UTC 2024] [1720597837] synced 0 tables out of 2
[Wed Jul 10 07:50:38 AM UTC 2024] [1720597838] waiting for subscription
[Wed Jul 10 07:50:38 AM UTC 2024] [1720597838] synced 0 tables out of 2
[Wed Jul 10 07:50:39 AM UTC 2024] [1720597839] waiting for subscription
[Wed Jul 10 07:50:39 AM UTC 2024] [1720597839] synced 0 tables out of 2
[Wed Jul 10 07:50:40 AM UTC 2024] [1720597840] waiting for subscription
[Wed Jul 10 07:50:40 AM UTC 2024] [1720597840] synced 0 tables out of 2
[Wed Jul 10 07:50:41 AM UTC 2024] [1720597841] waiting for subscription
[Wed Jul 10 07:50:41 AM UTC 2024] [1720597841] synced 0 tables out of 2
[Wed Jul 10 07:50:42 AM UTC 2024] [1720597842] waiting for subscription
[Wed Jul 10 07:50:42 AM UTC 2024] [1720597842] synced 0 tables out of 2

Attachment: crash-test-with-schema.sh
Description: Bourne shell script

Attachment: run-test-with-schema.sh
Description: Bourne shell script

Attachment: crash-test-tables-order-reverse.sh
Description: Bourne shell script

Reply via email to