On Tue, Jul 10, 2018 at 05:35:58PM +0300, Heikki Linnakangas wrote:
> Thanks for picking this up!
> 
> (I hope this gets through the email filters this time, sending a shell
> script seems to be difficult. I also trimmed the CC list, if that helps.)
> 
> On 04/07/18 07:59, Michael Paquier wrote:
>> Hence I propose the patch attached which disables the TRUNCATE and COPY
>> optimizations for two cases, which are the ones actually causing
>> problems.  One solution has been presented by Simon here for COPY, which
>> is to disable the optimization when there are no blocks on a relation
>> with wal_level = minimal:
>> https://www.postgresql.org/message-id/CANP8+jKN4V4MJEzFN_iEtdZ+1oM=yetxvmuu1yk4umxqy2g...@mail.gmail.com
>> For back-patching, I find that really appealing.
> 
> This fails in the case that there are any WAL-logged changes to the table
> while the COPY is running. That can happen at least if the table has an
> INSERT trigger, that performs operations on the same table, and the COPY
> fires the trigger. That scenario is covered by the little bash script I
> posted earlier in this thread
> (https://www.postgresql.org/message-id/55AFC302.1060805%40iki.fi). Attached
> is a new version of that script, updated to make it work with v11.

Thanks for the pointer.  My tap test has been covering two out of the
three scenarios you have in your script.  I have been able to convert
the extra as the attached, and I have added as well an extra test with
TRUNCATE triggers.  So it seems to me that we want to disable the
optimization if any type of trigger are defined on the relation copied
to as it could be possible that these triggers work on the blocks copied
as well, for any BEFORE/AFTER and STATEMENT/ROW triggers.  What do you
think?

>> The second thing that the patch attached does is to tweak
>> ExecuteTruncateGuts so as the TRUNCATE optimization never runs for
>> wal_level = minimal.
> 
> If we go down that route, let's at least keep the TRUNCATE optimization for
> temporary and unlogged tables.

Yes, that sounds right.  Fixed as well.  I have additionally done more
work on the comments.

Thoughts?
--
Michael
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3a66cb5025..7674369613 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -42,6 +42,7 @@
 #include "parser/parse_relation.h"
 #include "port/pg_bswap.h"
 #include "rewrite/rewriteHandler.h"
+#include "storage/bufmgr.h"
 #include "storage/fd.h"
 #include "tcop/tcopprot.h"
 #include "utils/builtins.h"
@@ -2367,13 +2368,23 @@ CopyFrom(CopyState cstate)
 	/*----------
 	 * Check to see if we can avoid writing WAL
 	 *
-	 * If archive logging/streaming is not enabled *and* either
-	 *	- table was created in same transaction as this COPY
+	 * WAL can be skipped if all the following conditions are satisfied:
+	 *	- table was created in same transaction as this COPY.
+	 *  - archive logging/streaming is enabled.
 	 *	- data is being written to relfilenode created in this transaction
 	 * then we can skip writing WAL.  It's safe because if the transaction
 	 * doesn't commit, we'll discard the table (or the new relfilenode file).
 	 * If it does commit, we'll have done the heap_sync at the bottom of this
 	 * routine first.
+	 *  - No triggers are defined on the relation, particularly BEFORE/AFTER
+	 * ROW INSERT triggers could try to write data to the same block copied
+	 * to when the INSERT are WAL-logged.
+	 *  - No actions which write an init block for any of the buffers that
+	 * will be touched during COPY have happened.  Since there is no way of
+	 * knowing at present which ones these are, we must use a simple but
+	 * effective heuristic to ensure safety of the COPY operation for all
+	 * cases, which is in this case to check that the relation copied to has
+	 * zero blocks.
 	 *
 	 * As mentioned in comments in utils/rel.h, the in-same-transaction test
 	 * is not always set correctly, since in rare cases rd_newRelfilenodeSubid
@@ -2404,7 +2415,10 @@ CopyFrom(CopyState cstate)
 		cstate->rel->rd_newRelfilenodeSubid != InvalidSubTransactionId)
 	{
 		hi_options |= HEAP_INSERT_SKIP_FSM;
-		if (!XLogIsNeeded())
+
+		if (!XLogIsNeeded() &&
+			cstate->rel->trigdesc == NULL &&
+			RelationGetNumberOfBlocks(cstate->rel) == 0)
 			hi_options |= HEAP_INSERT_SKIP_WAL;
 	}
 
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 7c0cf0d7ee..150f8c1fd2 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -1562,10 +1562,16 @@ ExecuteTruncateGuts(List *explicit_rels, List *relids, List *relids_logged,
 		 * the table was either created in the current (sub)transaction or has
 		 * a new relfilenode in the current (sub)transaction, then we can just
 		 * truncate it in-place, because a rollback would cause the whole
-		 * table or the current physical file to be thrown away anyway.
+		 * table or the current physical file to be thrown away anyway.  This
+		 * optimization is not safe with wal_level = minimal as there is no
+		 * actual way to know which are the blocks that could have been
+		 * touched by another operation done within this same transaction, be
+		 * it INSERT or COPY.  Non-permanent relations can also safely use
+		 * this optimization as they don't rely on WAL at recovery.
 		 */
-		if (rel->rd_createSubid == mySubid ||
-			rel->rd_newRelfilenodeSubid == mySubid)
+		if ((XLogIsNeeded() || !RelationNeedsWAL(rel)) &&
+			(rel->rd_createSubid == mySubid ||
+			 rel->rd_newRelfilenodeSubid == mySubid))
 		{
 			/* Immediate, non-rollbackable truncation is OK */
 			heap_truncate_one_rel(rel);
diff --git a/src/test/recovery/t/016_wal_optimize.pl b/src/test/recovery/t/016_wal_optimize.pl
new file mode 100644
index 0000000000..310772a2b3
--- /dev/null
+++ b/src/test/recovery/t/016_wal_optimize.pl
@@ -0,0 +1,192 @@
+# Test WAL replay for optimized TRUNCATE and COPY records
+#
+# WAL truncation is optimized in some cases with TRUNCATE and COPY queries
+# which sometimes interact badly with the other optimizations in line with
+# several setting values of wal_level, particularly when using "minimal" or
+# "replica".  The optimization may be enabled or disabled depending on the
+# scenarios dealt here, and should never result in any type of failures or
+# data loss.
+use strict;
+use warnings;
+
+use PostgresNode;
+use TestLib;
+use Test::More tests => 14;
+
+# Wrapper routine tunable for wal_level.
+sub run_wal_optimize
+{
+	my $wal_level = shift;
+
+	# Primary needs to have wal_level = minimal here
+	my $node = get_new_node("node_$wal_level");
+	$node->init;
+	$node->append_conf('postgresql.conf', qq(
+wal_level = $wal_level
+));
+	$node->start;
+
+	# Test direct truncation optimization.  No tuples
+	$node->safe_psql('postgres', "
+		BEGIN;
+		CREATE TABLE test1 (id serial PRIMARY KEY);
+		TRUNCATE test1;
+		COMMIT;");
+
+	$node->stop('immediate');
+	$node->start;
+
+	my $result = $node->safe_psql('postgres', "SELECT count(*) FROM test1;");
+	is($result, qq(0),
+	   "wal_level = $wal_level, optimized truncation with empty table");
+
+	# Test truncation with inserted tuples within the same transaction.
+	# Tuples inserted after the truncation should be seen.
+	$node->safe_psql('postgres', "
+		BEGIN;
+		CREATE TABLE test2 (id serial PRIMARY KEY);
+		INSERT INTO test2 VALUES (DEFAULT);
+		TRUNCATE test2;
+		INSERT INTO test2 VALUES (DEFAULT);
+		COMMIT;");
+
+	$node->stop('immediate');
+	$node->start;
+
+	$result = $node->safe_psql('postgres', "SELECT count(*) FROM test2;");
+	is($result, qq(1),
+	   "wal_level = $wal_level, optimized truncation with inserted table");
+
+	# Data file for COPY query in follow-up tests.
+	my $basedir = $node->basedir;
+	my $copy_file = "$basedir/copy_data.txt";
+	TestLib::append_to_file($copy_file, qq(20000,30000
+20001,30001
+20002,30002));
+
+	# Test truncation with inserted tuples using COPY.  Tuples copied after the
+	# truncation should be seen.
+	$node->safe_psql('postgres', "
+		BEGIN;
+		CREATE TABLE test3 (id serial PRIMARY KEY, id2 int);
+		INSERT INTO test3 (id, id2) VALUES (DEFAULT, generate_series(1,10000));
+		TRUNCATE test3;
+		COPY test3 FROM '$copy_file' DELIMITER ',';
+		COMMIT;");
+	$node->stop('immediate');
+	$node->start;
+	$result = $node->safe_psql('postgres', "SELECT count(*) FROM test3;");
+	is($result, qq(3),
+	   "wal_level = $wal_level, optimized truncation with copied table");
+
+	# Test truncation with inserted tuples using both INSERT and COPY. Tuples
+	# inserted after the truncation should be seen.
+	$node->safe_psql('postgres', "
+		BEGIN;
+		CREATE TABLE test4 (id serial PRIMARY KEY, id2 int);
+		INSERT INTO test4 (id, id2) VALUES (DEFAULT, generate_series(1,10000));
+		TRUNCATE test4;
+		INSERT INTO test4 (id, id2) VALUES (DEFAULT, 10000);
+		COPY test4 FROM '$copy_file' DELIMITER ',';
+		INSERT INTO test4 (id, id2) VALUES (DEFAULT, 10000);
+		COMMIT;");
+
+	$node->stop('immediate');
+	$node->start;
+	$result = $node->safe_psql('postgres', "SELECT count(*) FROM test4;");
+	is($result, qq(5),
+	   "wal_level = $wal_level, optimized truncation with inserted/copied table");
+
+	# Test consistency of COPY with INSERT for table created in the same
+	# transaction.
+	$node->safe_psql('postgres', "
+		BEGIN;
+		CREATE TABLE test5 (id serial PRIMARY KEY, id2 int);
+		INSERT INTO test5 VALUES (DEFAULT, 1);
+		COPY test5 FROM '$copy_file' DELIMITER ',';
+		COMMIT;");
+	$node->stop('immediate');
+	$node->start;
+	$result = $node->safe_psql('postgres', "SELECT count(*) FROM test5;");
+	is($result, qq(4),
+	   "wal_level = $wal_level, replay of optimized copy with inserted table");
+
+	# Test consistency of COPY that inserts more to the same table using
+	# triggers.  If the INSERTS from the trigger go to the same block data
+	# is copied to, and the INSERTs are WAL-logged, WAL replay will fail when
+	# it tries to replay the WAL record but the "before" image doesn't match,
+	# because not all changes were WAL-logged.
+	$node->safe_psql('postgres', "
+		BEGIN;
+		CREATE TABLE test6 (id serial PRIMARY KEY, id2 text);
+		CREATE FUNCTION test6_before_row_trig() RETURNS trigger
+		  LANGUAGE plpgsql as \$\$
+		  BEGIN
+		    IF new.id2 NOT LIKE 'triggered%' THEN
+		      INSERT INTO test6 VALUES (DEFAULT, 'triggered row before' || NEW.id2);
+		    END IF;
+		    RETURN NEW;
+		  END; \$\$;
+		CREATE FUNCTION test6_after_row_trig() RETURNS trigger
+		  LANGUAGE plpgsql as \$\$
+		  BEGIN
+		    IF new.id2 NOT LIKE 'triggered%' THEN
+		      INSERT INTO test6 VALUES (DEFAULT, 'triggered row after' || OLD.id2);
+		    END IF;
+		    RETURN NEW;
+		  END; \$\$;
+		CREATE TRIGGER test6_before_row_insert
+		  BEFORE INSERT ON test6
+		  FOR EACH ROW EXECUTE PROCEDURE test6_before_row_trig();
+		CREATE TRIGGER test6_after_row_insert
+		  AFTER INSERT ON test6
+		  FOR EACH ROW EXECUTE PROCEDURE test6_after_row_trig();
+		COPY test6 FROM '$copy_file' DELIMITER ',';
+		COMMIT;");
+	$node->stop('immediate');
+	$node->start;
+	$result = $node->safe_psql('postgres', "SELECT count(*) FROM test6;");
+	is($result, qq(9),
+	   "wal_level = $wal_level, replay of optimized copy with before trigger");
+
+	# Test consistency of INSERT, COPY and TRUNCATE in same transaction block
+	# with TRUNCATE triggers.
+	$node->safe_psql('postgres', "
+		BEGIN;
+		CREATE TABLE test7 (id serial PRIMARY KEY, id2 text);
+		CREATE FUNCTION test7_before_stat_trig() RETURNS trigger
+		  LANGUAGE plpgsql as \$\$
+		  BEGIN
+		    INSERT INTO test7 VALUES (DEFAULT, 'triggered stat before');
+		    RETURN NULL;
+		  END; \$\$;
+		CREATE FUNCTION test7_after_stat_trig() RETURNS trigger
+		  LANGUAGE plpgsql as \$\$
+		  BEGIN
+		    INSERT INTO test7 VALUES (DEFAULT, 'triggered stat before');
+		    RETURN NULL;
+		  END; \$\$;
+		CREATE TRIGGER test7_before_stat_truncate
+		  BEFORE TRUNCATE ON test7
+		  FOR EACH STATEMENT EXECUTE PROCEDURE test7_before_stat_trig();
+		CREATE TRIGGER test7_after_stat_truncate
+		  AFTER TRUNCATE ON test7
+		  FOR EACH STATEMENT EXECUTE PROCEDURE test7_after_stat_trig();
+		INSERT INTO test7 VALUES (DEFAULT, 1);
+		TRUNCATE test7;
+		COPY test7 FROM '$copy_file' DELIMITER ',';
+		COMMIT;");
+	$node->stop('immediate');
+	$node->start;
+	$result = $node->safe_psql('postgres', "SELECT count(*) FROM test7;");
+	is($result, qq(4),
+	   "wal_level = $wal_level, replay of optimized copy with before trigger");
+
+	$node->teardown_node;
+	$node->clean_node;
+	return;
+}
+
+# Run same test suite for multiple wal_level values.
+run_wal_optimize("minimal");
+run_wal_optimize("replica");

Attachment: signature.asc
Description: PGP signature

Reply via email to