On Wed, Jul 10, 2024 at 11:22 PM Nitin Motiani <nitinmoti...@google.com> wrote: > > On Wed, Jul 10, 2024 at 10:39 PM vignesh C <vignes...@gmail.com> wrote: > > > > On Wed, 10 Jul 2024 at 12:28, Amit Kapila <amit.kapil...@gmail.com> wrote: > > > The patch missed to use the ShareRowExclusiveLock for partitions, see > > > attached. I haven't tested it but they should also face the same > > > problem. Apart from that, I have changed the comments in a few places > > > in the patch. > > > > I could not hit the updated ShareRowExclusiveLock changes through the > > partition table, instead I could verify it using the inheritance > > table. Added a test for the same and also attaching the backbranch > > patch. > > > > Hi, > > I tested alternative-experimental-fix-lock.patch provided by Tomas > (replaces SUE with SRE in OpenTableList). I believe there are a couple > of scenarios the patch does not cover. > > 1. It doesn't handle the case of "ALTER PUBLICATION <pub> ADD TABLES > IN SCHEMA <schema>". > > I took crash-test.sh provided by Tomas and modified it to add all > tables in the schema to publication using the following command : > > ALTER PUBLICATION p ADD TABLES IN SCHEMA public > > The modified script is attached (crash-test-with-schema.sh). With this > script, I can reproduce the issue even with the patch applied. This is > because the code path to add a schema to the publication doesn't go > through OpenTableList. > > I have also attached a script run-test-with-schema.sh to run > crash-test-with-schema.sh in a loop with randomly generated parameters > (modified from run.sh provided by Tomas). > > 2. The second issue is a deadlock which happens when the alter > publication command is run for a comma separated list of tables. > > I created another script create-test-tables-order-reverse.sh. This > script runs a command like the following : > > ALTER PUBLICATION p ADD TABLE test_2,test_1 > > Running the above script, I was able to get a deadlock error (the > output is attached in deadlock.txt). In the alter publication command, > I added the tables in the reverse order to increase the probability of > the deadlock. But it should happen with any order of tables. > > I am not sure if the deadlock is a major issue because detecting the > deadlock is better than data loss. The schema issue is probably more > important. I didn't test it out with the latest patches sent by > Vignesh but since the code changes in that patch are also in > OpenTableList, I think the schema scenario won't be covered by those. >
Hi, I looked further into the scenario of adding the tables in schema to the publication. Since in that case, the entry is added to pg_publication_namespace instead of pg_publication_rel, the codepaths for 'add table' and 'add tables in schema' are different. And in the 'add tables in schema' scenario, the OpenTableList function is not called to get the relation ids. Therefore even with the proposed patch, the data loss issue still persists in that case. To validate this idea, I tried locking all the affected tables in the schema just before the invalidation for those relations (in ShareRowExclusiveLock mode). I am attaching the small patch for that (alter_pub_for_schema.patch) where the change is made in the function publication_add_schema in pg_publication.c. I am not sure if this is the best place to make this change or if it is the right fix. It is conceptually similar to the proposed change in OpenTableList but here we are not just changing the lockmode but taking locks which were not taken before. But with this change, the data loss errors went away in my test script. Another issue which persists with this change is the deadlock. Since multiple table locks are acquired, the test script detects deadlock a few times. Therefore I'm also attaching another modified script which does a few retries in case of deadlock. The script is crash-test-with-retries-for-schema.sh. It runs the following command in a retry loop : ALTER PUBLICATION p ADD TABLES IN SCHEMA public If the command fails, it sleeps for a random amount of time (upper bound by a MAXWAIT parameter) and then retries the command. If it fails to run the command in the max number of retries, the final return value from the script is DEADLOCK as we can't do a consistency check in this scenario. Also attached is another script run-with-deadlock-detection.sh which can run the above script for multiple iterations. I tried the test scripts with and without alter_pub_for_schema.patch. Without the patch, I get the final output ERROR majority of the time which means that the publication was altered successfully but the data was lost on the subscriber. When I run it with the patch, I get a mix of OK (no data loss) and DEADLOCK (the publication was not altered) but no ERROR. I think by changing the parameters of sleep time and number of retries we can get different fractions of OK and DEADLOCK. I am not sure if this is the right or a clean way to fix the issue but I think conceptually this might be the right direction. Please let me know if my understanding is wrong or if I'm missing something. Thanks & Regards, Nitin Motiani Google
alter_pub_for_schema.patch
Description: Binary data
#!/usr/bin/bash # # Runs crash-test-with-retries-for-schema.sh with randomized parameters in a loop, and ensures # each run completes within 15 minutes (to prevent infinite lockups). # # The randomized parameters are # # number of tables - 10 to 110 # refresh interval - 10 to 110 tables # sleep length - 1 to 11 seconds # # Each run stores data in a separate directory, derived from timestamp. # RUNS=$1 TIMEOUT=900 #for v in 12 14 16 master; do for v in master; do PUB_VERSION=$v SUB_VERSION=$v for r in `seq 1 $RUNS`; do t=$((10 + RANDOM % 100)) # number of tables m=$((10 + RANDOM % 100)) # publication refresh interval s=$((1 + RANDOM % 10)) # sleep between steps (seconds) OUTDIR=`date +%Y%m%d-%H%M%S` if [ -d "$OUTDIR" ]; then echo "directory $OUTDIR already exists" exit fi mkdir $OUTDIR # make sure the test is terminated after 1h (or change sleep $TIMEOUT && kill `pgrep crash-test-with-retries-for-schema.sh` & pid=$! # run the test with random parameters ./crash-test-with-retries-for-schema.sh $OUTDIR $t $m $s 5 30 > $OUTDIR/run.log 2>&1 r=$? kill $pid # make sure it's dead killall -9 -q postgres if [ "$r" == "0" ]; then echo $OUTDIR $t $m $s : OK elif [ "$r" == "1" ]; then echo $OUTDIR $t $m $s : ERROR else echo $OUTDIR $t $m $s : DEADLOCK fi done done
#!/usr/bin/bash # # ./crash-test.sh OUTPUT_DIRECTORY NUMBER_OF_TABLES REFRESH_INTERVAL SLEEP_SECONDS # # OUTPUT_DIRECTORY - directory where all the data directories / logs will be # NUMBER_OF_TABLES - number of tables to create, add to publication, ... # REFRESH_INTERVAL - call REFRESH PUBLICATION after adding this many tables # SLEEP_SECONDS - number of seconds to sleep between steps # # # What this does (briefly) # # - initialize and start two clusters - publisher and subscriber # - create publication/subscription between them # - start pgbench in the background, to generate changes # - create given number of tables, add them to the publication # - once in a while refresh the subscription to sync tables # - stop the pgbench # - wait for the subscriber to fully catch up # - compare data on publisher/subscriber (simple cross-check) # - if everything went OK, remove the data directories etc. # - if there was error, keep the data (but compress, to save space) # OUTDIR=$1 NUMTABLES=$2 REFRESH=$3 SLEEP=$4 NUMRETRIES=$5 MAXWAIT=$6 echo $PATH START_TIMESTAMP=`date +%s` echo [`date`] [`date +%s`] "NUMTABLES=$NUMTABLES SLEEP=$SLEEP REFRESH=$REFRESH" # config PUB_PORT=5001 PUB_DATA=$OUTDIR/data-pub SUB_PORT=5002 SUB_DATA=$OUTDIR/data-sub # initialize clusters pg_ctl -D $PUB_DATA init pg_ctl -D $SUB_DATA init echo 'wal_level = logical' >> $PUB_DATA/postgresql.conf echo "port = $PUB_PORT" >> $PUB_DATA/postgresql.conf echo "log_line_prefix = '%n %m [%p] [%b] [%a] [%v/%x] '" >> $PUB_DATA/postgresql.conf echo "port = $SUB_PORT" >> $SUB_DATA/postgresql.conf echo "log_line_prefix = '%n %m [%p] [%b] [%a] [%v/%x] '" >> $SUB_DATA/postgresql.conf pg_ctl -D $PUB_DATA -w -l $OUTDIR/pg-pub.log start pg_ctl -D $SUB_DATA -w -l $OUTDIR/pg-sub.log start createdb -p $PUB_PORT test > $OUTDIR/init-pub.log 2>&1 createdb -p $SUB_PORT test > $OUTDIR/init-sub.log 2>&1 # create table on both ends for t in `seq 1 $NUMTABLES`; do psql -p $PUB_PORT test -c "CREATE TABLE test_${t} (id SERIAL PRIMARY KEY, ts timestamptz, lsn pg_lsn, val TEXT)" psql -p $SUB_PORT test -c "CREATE TABLE test_${t} (id SERIAL PRIMARY KEY, ts timestamptz, lsn pg_lsn, val TEXT)" done # generate insert script for pgbench echo > insert.sql for t in `seq 1 $NUMTABLES`; do echo "INSERT INTO test_${t} (ts, lsn, val) VALUES (clock_timestamp(), pg_current_wal_insert_lsn(), md5(random()::text));" >> insert.sql done sort -R insert.sql > insert.tmp && mv insert.tmp insert.sql echo "BEGIN;" > insert.tmp cat insert.sql >> insert.tmp echo "COMMIT;" >> insert.tmp mv insert.tmp insert.sql mv insert.sql $OUTDIR/ # run pgbench on the source in the background (5m is overkill, we will terminate) pgbench -n -c 8 -f $OUTDIR/insert.sql -p $PUB_PORT -T 3600 test > $OUTDIR/pgbench.log 2>&1 & # create publication on 12 and add the table to it echo [`date`] [`date +%s`] "creating publication" psql -p $PUB_PORT test -c "CREATE PUBLICATION p" # sleep for a few seconds echo [`date`] [`date +%s`] "sleeping" sleep $SLEEP # add create subscription on 14 echo [`date`] [`date +%s`] "creating subscription" psql -p $SUB_PORT test -c "CREATE SUBSCRIPTION s CONNECTION 'application_name=test host=localhost user=nitinmotiani dbname=test port=$PUB_PORT' PUBLICATION p" # sleep for a few seconds echo [`date`] [`date +%s`] "sleeping" sleep $SLEEP num_retries=$NUMRETRIES curr=0 until [ "$curr" -ge "$num_retries" ] do echo "Attempt number : $curr" psql -p $PUB_PORT test -c "ALTER PUBLICATION p ADD TABLES IN SCHEMA public" && break curr=$((curr+1)) sleep $((RANDOM%$MAXWAIT)) done deadlock=0 if [ "$curr" == "$num_retries" ]; then deadlock=1 fi refreshed=0 for t in `seq 1 $NUMTABLES`; do lsn=`psql -p $PUB_PORT test -t -A -c "SELECT pg_current_wal_lsn()"` #echo [`date`] [`date +%s`] "adding table $t to publication [$lsn]" #psql -p $PUB_PORT test -c "ALTER PUBLICATION p ADD TABLE test_${t}" m=$((t % REFRESH)) if [ "$m" == "0" ] && [ "$refreshed" == "0" ]; then lsn=`psql -p $PUB_PORT test -t -A -c "SELECT pg_current_wal_lsn()"` echo [`date`] [`date +%s`] "refreshing publication [$lsn] in loop" psql -p $SUB_PORT test -c "ALTER SUBSCRIPTION s REFRESH PUBLICATION" psql -p $PUB_PORT test -c "SELECT pg_current_wal_lsn() AS wal_lsn, pg_current_wal_insert_lsn() AS insert_lsn, (SELECT relname FROM pg_class AS c WHERE c.oid = p.prrelid) AS relname, * FROM pg_publication_rel p ORDER BY prrelid" >> $OUTDIR/pg_publication_rel.log 2>&1 refreshed=1 fi done if [ "$refreshed" == "0" ]; then lsn=`psql -p $PUB_PORT test -t -A -c "SELECT pg_current_wal_lsn()"` echo [`date`] [`date +%s`] "refreshing publication [$lsn]" psql -p $SUB_PORT test -c "ALTER SUBSCRIPTION s REFRESH PUBLICATION" fi psql -p $PUB_PORT test -c "SELECT pg_current_wal_lsn() AS wal_lsn, pg_current_wal_insert_lsn() AS insert_lsn, (SELECT relname FROM pg_class AS c WHERE c.oid = p.prrelid) AS relname, * FROM pg_publication_rel p ORDER BY prrelid" >> $OUTDIR/pg_publication_rel.log 2>&1 # wait for all tables to get in sync (essentially what wait_for_subscription_sync does) test_total=`psql -p $SUB_PORT test -t -A -c "SELECT count(*) FROM pg_subscription_rel"` echo "Total num entries in the rel table are $test_total" while /bin/true; do echo [`date`] [`date +%s`] "waiting for subscription" c=`psql -p $SUB_PORT test -t -A -c "SELECT count(*) FROM pg_subscription_rel WHERE srsubstate = 'r'"` if [ "$c" == "$test_total" ]; then break fi echo [`date`] [`date +%s`] "synced $c tables out of $NUMTABLES" sleep 1 done # sleep for a few more seconds, to generate more pgbench changes echo [`date`] [`date +%s`] "sleeping" sleep $SLEEP # kill the pgbench, and wait for all the associated backends to terminate killall pgbench while /bin/true; do echo [`date`] [`date +%s`] "waiting for pgbench backends to die" r=`psql -p $PUB_PORT test -t -A -c "SELECT count(*) FROM pg_stat_activity WHERE application_name = 'pgbench'"` if [ "$r" == "0" ]; then break fi sleep 1 done # get current LSN on publisher and wait for subscriber to fully catchup (same as wait_for_catchup) lsn=`psql -p $PUB_PORT test -t -A -c "SELECT pg_current_wal_lsn()"` echo [`date`] [`date +%s`] "waiting for replay of LSN $lsn" start_lsn=`psql -p $PUB_PORT test -t -A -c "SELECT replay_lsn FROM pg_catalog.pg_stat_replication WHERE application_name = 'test'"` start_time=`date +%s` while /bin/true; do if [ "$deadlock" != 0 ]; then break fi echo [`date`] [`date +%s`] "waiting for catchup $lsn" replay_lsn=`psql -p $PUB_PORT test -t -A -c "SELECT replay_lsn FROM pg_catalog.pg_stat_replication WHERE application_name = 'test'"` remaining_wal=`psql -p $PUB_PORT test -t -A -c "SELECT ('$lsn' - replay_lsn) FROM pg_catalog.pg_stat_replication WHERE application_name = 'test'"` remaining_kb=$((remaining_wal/1024)) current_time=`date +%s` delta_time=$((current_time - start_time + 1)) delta_wal=`psql -p $PUB_PORT test -t -A -c "SELECT (replay_lsn - '$start_lsn') FROM pg_catalog.pg_stat_replication WHERE application_name = 'test'"` total_wal=`psql -p $PUB_PORT test -t -A -c "SELECT ('$lsn'::pg_lsn - '$start_lsn'::pg_lsn) FROM pg_catalog.pg_stat_replication WHERE application_name = 'test'"` speed=$((delta_wal / delta_time)) if [ "$speed" != "0" ]; then remaining_sec=$((remaining_wal / speed)) else remaining_sec='???' fi r=`psql -p $PUB_PORT test -t -A -c "SELECT replay_lsn >= '$lsn' AND state = 'streaming' FROM pg_catalog.pg_stat_replication WHERE application_name = 'test'"` if [ "$r" == "t" ]; then break fi echo [`date`] [`date +%s`] "replay LSN $replay_lsn, remaining $remaining_kb kB / $remaining_sec seconds" sleep 1 done # dump basic info about subscription and subscriber psql -p $SUB_PORT test -c "SELECT oid, relname FROM pg_class AS c WHERE relname like 'test_%' ORDER BY relname" > $OUTDIR/pg_class.log 2>&1 psql -p $SUB_PORT test -c "SELECT (SELECT relname FROM pg_class AS c WHERE c.oid = r.srrelid) AS relname, * FROM pg_subscription_rel AS r ORDER BY 1" > $OUTDIR/pg_subscription_rel.log 2>&1 psql -p $PUB_PORT test -c "SELECT pg_current_wal_lsn() AS wal_lsn, pg_current_wal_insert_lsn() AS insert_lsn, * FROM pg_stat_replication" > $OUTDIR/pg_stat_replication.log 2>&1 # check consistency - compare number of records, sum, ... could be made more thorough, but good enough echo [`date`] [`date +%s`] "caught up, check consistency" r="OK" c=0 for t in `seq 1 $NUMTABLES`; do a=`psql -p $PUB_PORT test -t -A -c "SELECT count(*), sum(id) FROM test_${t}"` b=`psql -p $SUB_PORT test -t -A -c "SELECT count(*), sum(id) FROM test_${t}"` if [ "$a" == "$b" ]; then echo [`date`] [`date +%s`] "$t OK ($a $b)" else echo [`date`] [`date +%s`] "$t ERROR ($a $b)" r="ERROR" c=1 fi done if [ "$deadlock" == "1" ]; then r="DEADLOCK" c=2 fi # cleanup tables on both ends for t in `seq 1 $NUMTABLES`; do psql -p $PUB_PORT test -c "SELECT * FROM test_${t} ORDER BY id" > $OUTDIR/test_${t}_pub.data psql -p $SUB_PORT test -c "SELECT * FROM test_${t} ORDER BY id" > $OUTDIR/test_${t}_sub.data done pg_ctl -D $PUB_DATA -w -m immediate stop pg_ctl -D $SUB_DATA -w -m immediate stop # if went OK, remove the data directories if [ "$r" == "OK" ]; then rm -Rf $PUB_DATA rm -Rf $SUB_DATA # remove the log files too, may be pretty large due to debugging rm $OUTDIR/pg-pub.log $OUTDIR/pg-sub.log else # for error, compress the data directories tar -c $PUB_DATA | gzip --fast > $PUB_DATA.tgz tar -c $SUB_DATA | gzip --fast > $SUB_DATA.tgz # and also the data directories pigz --fast $OUTDIR/pg-pub.log $OUTDIR/pg-sub.log fi date END_TIMESTAMP=`date +%s` echo [`date`] [`date +%s`] "RUNTIME: " $((END_TIMESTAMP - START_TIMESTAMP)) echo [`date`] [`date +%s`] "RESULT: $r" exit $c