On Sat, Jan 11, 2025 at 5:54 PM Kirill Reshke <reshkekir...@gmail.com> wrote:
> On Fri, 10 Jan 2025 at 11:38, jian he <jian.universal...@gmail.com> wrote:
> > I think there are three remaining issues that may need more attention
> > 1.
> > Table 27.42. pg_stat_progress_copy View
> > (<structname>pg_stat_progress_copy</structname>)
> > column pg_stat_progress_copy.tuples_skipped now the description is
> > ""
> > When the ON_ERROR option is set to ignore, this value shows the number of 
> > tuples
> > skipped due to malformed data. When the ON_ERROR option is set to 
> > set_to_null,
> > this value shows the number of tuples where malformed data was converted to
> > NULL.
> > """
> > now the column name tuples_skipped would not be that suitable for
> > (on_error set_to_null).
> > since now it is not tuple skipped, it is in a tuple some value was set to 
> > null.
> Indeed this is something we need to fix.
> > Or
> > we can skip progress reports for (on_error set_to_null) case.
> Maybe we can add a `malformed_tuples` column to this view?
we can do this later.
so for on_error set_to_null, i've removed pgstat_progress_update_param
related code.

the attached patch also did some doc enhancement, error message enhancement.
From a95d42bf1e6044c6c9a2afbb15d168d6679eceab Mon Sep 17 00:00:00 2001
From: jian he <jian.universal...@gmail.com>
Date: Tue, 14 Jan 2025 13:46:12 +0800
Subject: [PATCH v11 1/1] COPY (on_error set_to_null)

Extent "on_error action", introduce new option:  on_error set_to_null.
Current grammar makes us unable to use "on_error null", so we choose "on_error set_to_null".

Any data type conversion errors during the COPY FROM process will result in the
affected column being set to NULL. This only applicable when using the
non-binary format for COPY FROM. However, the not-null constraint will still be
enforced. If a conversion error leads to a NULL value in a column that has a
not-null constraint, a not-null constraint violation error will be triggered.

A regression test for a domain with a not-null constraint has been added.

Author: Jian He <jian.universal...@gmail.com>,
Author: Kirill Reshke <reshkekir...@gmail.com>

Fujii Masao <masao.fu...@oss.nttdata.com>,
Jim Jones <jim.jo...@uni-muenster.de>,
"David G. Johnston" <david.g.johns...@gmail.com>,
Yugo NAGATA <nag...@sraoss.co.jp>,
torikoshia <torikos...@oss.nttdata.com>

discussion: https://postgr.es/m/CAKFQuwawy1e6YR4S=j+y7pXqg_Dw1WBVrgvf=bp3d1_asfe...@mail.gmail.com
 doc/src/sgml/ref/copy.sgml               | 34 +++++++++-----
 src/backend/commands/copy.c              |  6 ++-
 src/backend/commands/copyfrom.c          | 29 ++++++++----
 src/backend/commands/copyfromparse.c     | 46 +++++++++++++++++-
 src/bin/psql/tab-complete.in.c           |  2 +-
 src/include/commands/copy.h              |  1 +
 src/include/commands/copyfrom_internal.h |  4 +-
 src/test/regress/expected/copy2.out      | 60 ++++++++++++++++++++++++
 src/test/regress/sql/copy2.sql           | 46 ++++++++++++++++++
 9 files changed, 201 insertions(+), 27 deletions(-)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 8394402f09..5e1d08ab91 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -394,23 +394,34 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       Specifies how to behave when encountering an error converting a column's
       input value into its data type.
       An <replaceable class="parameter">error_action</replaceable> value of
-      <literal>stop</literal> means fail the command, while
-      <literal>ignore</literal> means discard the input row and continue with the next one.
+      <literal>stop</literal> means fail the command,
+      <literal>ignore</literal> means discard the input row and continue with the next one, and
+      <literal>set_to_null</literal> means replace columns containing erroneous input values with
+      <literal>NULL</literal> and move to the next field.
       The default is <literal>stop</literal>.
-      The <literal>ignore</literal> option is applicable only for <command>COPY FROM</command>
+      The <literal>ignore</literal> and <literal>set_to_null</literal>
+      options are applicable only for <command>COPY FROM</command>
       when the <literal>FORMAT</literal> is <literal>text</literal> or <literal>csv</literal>.
-      A <literal>NOTICE</literal> message containing the ignored row count is
-      emitted at the end of the <command>COPY FROM</command> if at least one
-      row was discarded. When <literal>LOG_VERBOSITY</literal> option is set to
-      <literal>verbose</literal>, a <literal>NOTICE</literal> message
+      For <literal>ignore</literal> option,
+      a <literal>NOTICE</literal> message containing the ignored row count is
+      emitted at the end of the <command>COPY FROM</command> if at least one row was discarded.
+      For <literal>set_to_null</literal> option,
+      a <literal>NOTICE</literal> message containing the row count that erroneous input values replaced by to null
+      happened is emitted at the end of the <command>COPY FROM</command> if at least one row was replaced.
+     </para>
+     <para>
+      When <literal>LOG_VERBOSITY</literal> option is set to <literal>verbose</literal>,
+      for <literal>ignore</literal> option, a <literal>NOTICE</literal> message
       containing the line of the input file and the column name whose input
-      conversion has failed is emitted for each discarded row.
-      When it is set to <literal>silent</literal>, no message is emitted
-      regarding ignored rows.
+      conversion has failed is emitted for each discarded row;
+      for <literal>set_to_null</literal> option,
+      a <literal>NOTICE</literal> message containing the line of the input file and the column name
+      where value was replaced with <literal>NULL</literal> for each input conversion failure.
+      When it is set to <literal>silent</literal>, no message is emitted regarding input conversion failed rows.
@@ -458,7 +469,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable
       This is currently used in <command>COPY FROM</command> command when
-      <literal>ON_ERROR</literal> option is set to <literal>ignore</literal>.
+      <literal>ON_ERROR</literal> option is set to <literal>ignore</literal>
+      or <literal>set_to_null</literal>.
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index cfca9d9dc2..afe60758d4 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -403,12 +403,14 @@ defGetCopyOnErrorChoice(DefElem *def, ParseState *pstate, bool is_from)
 				 parser_errposition(pstate, def->location)));
-	 * Allow "stop", or "ignore" values.
+	 * Allow "stop", "ignore", "set_to_null" values.
 	if (pg_strcasecmp(sval, "stop") == 0)
 	if (pg_strcasecmp(sval, "ignore") == 0)
+	if (pg_strcasecmp(sval, "set_to_null") == 0)
@@ -918,7 +920,7 @@ ProcessCopyOptions(ParseState *pstate,
 				 errmsg("only ON_ERROR STOP is allowed in BINARY mode")));
-	if (opts_out->reject_limit && !opts_out->on_error)
+	if (opts_out->reject_limit && opts_out->on_error != COPY_ON_ERROR_IGNORE)
 		/*- translator: first and second %s are the names of COPY option, e.g.
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 0cbd05f560..c38ff3dc6f 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1321,14 +1321,22 @@ CopyFrom(CopyFromState cstate)
 	/* Done, clean up */
 	error_context_stack = errcallback.previous;
-	if (cstate->opts.on_error != COPY_ON_ERROR_STOP &&
-		cstate->num_errors > 0 &&
+	if (cstate->num_errors > 0 &&
 		cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT)
-		ereport(NOTICE,
-				errmsg_plural("%llu row was skipped due to data type incompatibility",
-							  "%llu rows were skipped due to data type incompatibility",
-							  (unsigned long long) cstate->num_errors,
-							  (unsigned long long) cstate->num_errors));
+	{
+		if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+			ereport(NOTICE,
+					errmsg_plural("%llu row was skipped due to data type incompatibility",
+								  "%llu rows were skipped due to data type incompatibility",
+								  (unsigned long long) cstate->num_errors,
+								  (unsigned long long) cstate->num_errors));
+		else if (cstate->opts.on_error == COPY_ON_ERROR_NULL)
+			ereport(NOTICE,
+					errmsg_plural("erroneous values in %llu row was replaced with null",
+								  "erroneous values in %llu rows were replaced with null",
+								  (unsigned long long) cstate->num_errors,
+								  (unsigned long long) cstate->num_errors));
+	}
 	if (bistate != NULL)
@@ -1474,10 +1482,11 @@ BeginCopyFrom(ParseState *pstate,
 		cstate->escontext->error_occurred = false;
-		 * Currently we only support COPY_ON_ERROR_IGNORE. We'll add other
-		 * options later
+		 * Currently we only support COPY_ON_ERROR_IGNORE, COPY_ON_ERROR_NULL.
+		 * We'll add other options later
-		if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
+		if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE ||
+			cstate->opts.on_error == COPY_ON_ERROR_NULL)
 			cstate->escontext->details_wanted = false;
diff --git a/src/backend/commands/copyfromparse.c b/src/backend/commands/copyfromparse.c
index caccdc8563..c0f6ce5057 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -871,6 +871,7 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
 		int			fldct;
 		int			fieldno;
 		char	   *string;
+		bool		current_row_erroneous = false;
 		/* read raw fields in the next line */
 		if (!NextCopyFromRawFields(cstate, &field_strings, &fldct))
@@ -949,7 +950,8 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
 			 * If ON_ERROR is specified with IGNORE, skip rows with soft
-			 * errors
+			 * errors. If ON_ERROR is specified with set_to_null, try
+			 * to replace with null.
 			else if (!InputFunctionCallSafe(&in_functions[m],
@@ -960,9 +962,47 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
 				Assert(cstate->opts.on_error != COPY_ON_ERROR_STOP);
+				if (cstate->opts.on_error == COPY_ON_ERROR_NULL)
+				{
+					/*
+					 * we use this count the number of rows (not fields) that
+					 * successfully applied the on_error set_to_null
+					*/
+					if (!current_row_erroneous)
+						current_row_erroneous = true;
+					/*
+					 * we need another InputFunctionCallSafe so we can error out
+					 * not-null violation for domain with not-null constraint.
+					*/
+					cstate->escontext->error_occurred = false;
+					if (InputFunctionCallSafe(&in_functions[m],
+											  NULL,
+											  typioparams[m],
+											  att->atttypmod,
+											  (Node *) cstate->escontext,
+											  &values[m]))
+					{
+						nulls[m] = true;
+						values[m] = (Datum) 0;
+						if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+							ereport(NOTICE,
+									errmsg("column \"%s\" was set to null due to data type incompatibility at line %llu",
+											cstate->cur_attname,
+											(unsigned long long) cstate->cur_lineno));
+						continue;
+					}
+					else
+						ereport(ERROR,
+								errmsg("domain %s does not allow null values", format_type_be(typioparams[m])),
+								errdatatype(typioparams[m]));
+				}
-				if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE)
+				if (cstate->opts.log_verbosity == COPY_LOG_VERBOSITY_VERBOSE &&
+					cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
 					 * Since we emit line number and column info in the below
@@ -1001,6 +1041,8 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,
 			cstate->cur_attval = NULL;
+		if (current_row_erroneous)
+			cstate->num_errors++;
 		Assert(fieldno == attr_count);
diff --git a/src/bin/psql/tab-complete.in.c b/src/bin/psql/tab-complete.in.c
index 81cbf10aa2..04a155ad5f 100644
--- a/src/bin/psql/tab-complete.in.c
+++ b/src/bin/psql/tab-complete.in.c
@@ -3250,7 +3250,7 @@ match_previous_words(int pattern_id,
 	/* Complete COPY <sth> FROM|TO filename WITH (FORMAT */
 	else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "FORMAT"))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 06dfdfef72..7ebf4f7893 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -38,6 +38,7 @@ typedef enum CopyOnErrorChoice
 	COPY_ON_ERROR_STOP = 0,		/* immediately throw errors, default */
 	COPY_ON_ERROR_IGNORE,		/* ignore errors */
+	COPY_ON_ERROR_NULL,			/* set error field to null */
 } CopyOnErrorChoice;
diff --git a/src/include/commands/copyfrom_internal.h b/src/include/commands/copyfrom_internal.h
index 1d8ac8f62e..50759eaf1c 100644
--- a/src/include/commands/copyfrom_internal.h
+++ b/src/include/commands/copyfrom_internal.h
@@ -98,7 +98,9 @@ typedef struct CopyFromStateData
 	ErrorSaveContext *escontext;	/* soft error trapped during in_functions
 									 * execution */
 	uint64		num_errors;		/* total number of rows which contained soft
-								 * errors */
+								 * errors, for ON_ERROR set_to_null, it's the
+								 * number of rows successfully converted to null
+								*/
 	int		   *defmap;			/* array of default att numbers related to
 								 * missing att */
 	ExprState **defexprs;		/* array of default att expressions for all
diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out
index 64ea33aeae..9a5acef8db 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -81,6 +81,10 @@ COPY x from stdin (on_error ignore, on_error ignore);
 ERROR:  conflicting or redundant options
 LINE 1: COPY x from stdin (on_error ignore, on_error ignore);
+COPY x from stdin (on_error set_to_null, on_error ignore);
+ERROR:  conflicting or redundant options
+LINE 1: COPY x from stdin (on_error set_to_null, on_error ignore);
+                                                 ^
 COPY x from stdin (log_verbosity default, log_verbosity verbose);
 ERROR:  conflicting or redundant options
 LINE 1: COPY x from stdin (log_verbosity default, log_verbosity verb...
@@ -92,6 +96,10 @@ COPY x from stdin (format BINARY, null 'x');
 ERROR:  cannot specify NULL in BINARY mode
 COPY x from stdin (format BINARY, on_error ignore);
 ERROR:  only ON_ERROR STOP is allowed in BINARY mode
+COPY x from stdin (format BINARY, on_error set_to_null);
+ERROR:  only ON_ERROR STOP is allowed in BINARY mode
+COPY x FROM stdin (on_error set_to_null, reject_limit 2);
 COPY x from stdin (on_error unsupported);
 ERROR:  COPY ON_ERROR "unsupported" not recognized
 LINE 1: COPY x from stdin (on_error unsupported);
@@ -124,6 +132,10 @@ COPY x to stdout (format BINARY, on_error unsupported);
 ERROR:  COPY ON_ERROR cannot be used with COPY TO
 LINE 1: COPY x to stdout (format BINARY, on_error unsupported);
+COPY x to stdin (on_error set_to_null);
+ERROR:  COPY ON_ERROR cannot be used with COPY TO
+LINE 1: COPY x to stdin (on_error set_to_null);
+                         ^
 COPY x from stdin (log_verbosity unsupported);
 ERROR:  COPY LOG_VERBOSITY "unsupported" not recognized
 LINE 1: COPY x from stdin (log_verbosity unsupported);
@@ -769,6 +781,51 @@ CONTEXT:  COPY check_ign_err
 NOTICE:  skipping row due to data type incompatibility at line 8 for column "k": "a"
 CONTEXT:  COPY check_ign_err
 NOTICE:  6 rows were skipped due to data type incompatibility
+CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0);
+CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT);
+\pset null NULL
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+ERROR:  domain d_int_not_null does not allow null values
+CONTEXT:  COPY t_on_error_null, line 1, column a: null input
+--fail, column a is domain with not-null constraint
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+ERROR:  domain d_int_not_null does not allow null values
+CONTEXT:  COPY t_on_error_null, line 1, column a: "a"
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+ERROR:  domain d_int_not_null does not allow null values
+CONTEXT:  COPY t_on_error_null, line 1, column a: "-1"
+--fail. less data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null);
+ERROR:  missing data for column "c"
+CONTEXT:  COPY t_on_error_null, line 1: "1,1"
+--fail. extra data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null);
+ERROR:  extra data after last expected column
+CONTEXT:  COPY t_on_error_null, line 1: "1,2,3,4"
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null, log_verbosity verbose);
+NOTICE:  column "b" was set to null due to data type incompatibility at line 1
+CONTEXT:  COPY t_on_error_null, line 1, column b: "a"
+NOTICE:  column "c" was set to null due to data type incompatibility at line 1
+CONTEXT:  COPY t_on_error_null, line 1, column c: "d"
+NOTICE:  column "b" was set to null due to data type incompatibility at line 2
+CONTEXT:  COPY t_on_error_null, line 2, column b: "b"
+NOTICE:  column "c" was set to null due to data type incompatibility at line 3
+CONTEXT:  COPY t_on_error_null, line 3, column c: "e"
+NOTICE:  erroneous values in 3 rows were replaced with null
+-- check inserted content
+select * from t_on_error_null;
+ a  |  b   |  c   
+ 10 | NULL | NULL
+ 11 | NULL |   12
+ 13 |   14 | NULL
+(3 rows)
+\pset null ''
 -- tests for on_error option with log_verbosity and null constraint via domain
 CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
 CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -828,6 +885,9 @@ DROP VIEW instead_of_insert_tbl_view;
 DROP VIEW instead_of_insert_tbl_view_2;
 DROP FUNCTION fun_instead_of_insert_tbl();
 DROP TABLE check_ign_err;
+DROP TABLE t_on_error_null;
+DROP DOMAIN d_int_not_null;
+DROP DOMAIN d_int_positive_maybe_null;
 DROP TABLE check_ign_err2;
 DROP DOMAIN dcheck_ign_err2;
 DROP TABLE hard_err;
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 45273557ce..003a91648e 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -67,12 +67,15 @@ COPY x from stdin (force_null (a), force_null (b));
 COPY x from stdin (convert_selectively (a), convert_selectively (b));
 COPY x from stdin (encoding 'sql_ascii', encoding 'sql_ascii');
 COPY x from stdin (on_error ignore, on_error ignore);
+COPY x from stdin (on_error set_to_null, on_error ignore);
 COPY x from stdin (log_verbosity default, log_verbosity verbose);
 -- incorrect options
 COPY x from stdin (format BINARY, delimiter ',');
 COPY x from stdin (format BINARY, null 'x');
 COPY x from stdin (format BINARY, on_error ignore);
+COPY x from stdin (format BINARY, on_error set_to_null);
+COPY x FROM stdin (on_error set_to_null, reject_limit 2);
 COPY x from stdin (on_error unsupported);
 COPY x from stdin (format TEXT, force_quote(a));
 COPY x from stdin (format TEXT, force_quote *);
@@ -87,6 +90,7 @@ COPY x from stdin (format TEXT, force_null *);
 COPY x to stdout (format CSV, force_null(a));
 COPY x to stdout (format CSV, force_null *);
 COPY x to stdout (format BINARY, on_error unsupported);
+COPY x to stdin (on_error set_to_null);
 COPY x from stdin (log_verbosity unsupported);
 COPY x from stdin with (reject_limit 1);
 COPY x from stdin with (on_error ignore, reject_limit 0);
@@ -534,6 +538,45 @@ a	{2}	2
 8	{8}	8
+CREATE DOMAIN d_int_not_null AS INT NOT NULL CHECK(value > 0);
+CREATE DOMAIN d_int_positive_maybe_null AS INT CHECK(value > 0);
+CREATE TABLE t_on_error_null (a d_int_not_null, b d_int_positive_maybe_null, c INT);
+\pset null NULL
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+\N	11	13
+--fail, column a is domain with not-null constraint
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+a	11	14
+--fail, column a cannot set to null value
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null);
+-1	11	13
+--fail. less data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null);
+--fail. extra data
+COPY t_on_error_null FROM STDIN WITH (delimiter ',', on_error set_to_null);
+COPY t_on_error_null FROM STDIN WITH (on_error set_to_null, log_verbosity verbose);
+10	a	d
+11	b	12
+13	14	e
+-- check inserted content
+select * from t_on_error_null;
+\pset null ''
 -- tests for on_error option with log_verbosity and null constraint via domain
 CREATE DOMAIN dcheck_ign_err2 varchar(15) NOT NULL;
 CREATE TABLE check_ign_err2 (n int, m int[], k int, l dcheck_ign_err2);
@@ -603,6 +646,9 @@ DROP VIEW instead_of_insert_tbl_view;
 DROP VIEW instead_of_insert_tbl_view_2;
 DROP FUNCTION fun_instead_of_insert_tbl();
 DROP TABLE check_ign_err;
+DROP TABLE t_on_error_null;
+DROP DOMAIN d_int_not_null;
+DROP DOMAIN d_int_positive_maybe_null;
 DROP TABLE check_ign_err2;
 DROP DOMAIN dcheck_ign_err2;
 DROP TABLE hard_err;

Reply via email to