On 2024/09/26 21:57, torikoshia wrote:
Updated the patches.
Thanks for updating the patches! I’ve made some changes based on your work, which are attached. Barring any objections, I'm thinking to push these patches. For patches 0001 and 0003, I ran pgindent and updated the commit message. Regarding patch 0002: - I updated the regression test to run ANALYZE on the file_fdw foreign table since the on_error option also affects the ANALYZE command. To ensure test stability, the test now runs ANALYZE with log_verbosity = 'silent'. - I removed the code that updated the count of skipped rows for the pg_stat_progress_copy view. As far as I know, file_fdw doesn’t currently support tracking pg_stat_progress_copy.tuples_processed. Supporting only tuples_skipped seems inconsistent, so I suggest creating a separate patch to extend file_fdw to track both tuples_processed and tuples_skipped in this view. - I refactored the for-loop handling on_error = 'ignore' in fileIterateForeignScan() by replacing it with a goto statement for improved readability. - I modified file_fdw to log a NOTICE message about skipped rows at the end of ANALYZE if any rows are skipped due to the on_error = 'ignore' setting. Regarding the "file contains XXX rows" message reported by the ANALYZE VERBOSE command on the file_fdw foreign table, what number should be reflected in XXX, especially when some rows are skipped due to on_error = 'ignore'? Currently, the count only includes valid rows, excluding any skipped rows. I haven't modified this code yet. Should we instead count all rows (valid and erroneous) and report that total? I noticed the code for reporting the number of skipped rows due to on_error = 'ignore' appears in three places. I’m considering creating a common function for this reporting to eliminate redundancy but haven’t implemented it yet. - I’ve updated the commit message and run pgindent. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
From 6bcf56dc0556b3e9ded7200229c05c69e9c4fd6a Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi <torikos...@oss.nttdata.com> Date: Wed, 25 Sep 2024 21:28:15 +0900 Subject: [PATCH v6 1/3] Add log_verbosity = 'silent' support to COPY command. Previously, when the on_error option was set to ignore, the COPY command would always log NOTICE messages for input rows discarded due to data type incompatibility. Users had no way to suppress these messages. This commit introduces a new log_verbosity setting, 'silent', which prevents the COPY command from emitting NOTICE messages when on_error = 'ignore' is used, even if rows are discarded. This feature is particularly useful when processing malformed files frequently, where a flood of NOTICE messages can be undesirable. For example, when frequently loading malformed files via the COPY command or querying foreign tables using file_fdw (with an upcoming patch to add on_error support for file_fdw), users may prefer to suppress these messages to reduce log noise and improve clarity. Author: Atsushi Torikoshi Reviewed-by: Fujii Masao Discussion: https://postgr.es/m/ab59dad10490ea3734cf022b16c24...@oss.nttdata.com --- doc/src/sgml/ref/copy.sgml | 10 +++++++--- src/backend/commands/copy.c | 4 +++- src/backend/commands/copyfrom.c | 3 ++- src/bin/psql/tab-complete.c | 2 +- src/include/commands/copy.h | 4 +++- src/test/regress/expected/copy2.out | 4 +++- src/test/regress/sql/copy2.sql | 4 ++++ 7 files changed, 23 insertions(+), 8 deletions(-) diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml index 1518af8a04..d87684a5be 100644 --- a/doc/src/sgml/ref/copy.sgml +++ b/doc/src/sgml/ref/copy.sgml @@ -407,6 +407,8 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable <literal>verbose</literal>, a <literal>NOTICE</literal> message containing the line of the input file and the column name whose input conversion has failed is emitted for each discarded row. + When it is set to <literal>silent</literal>, no message is emitted + regarding ignored rows. </para> </listitem> </varlistentry> @@ -428,9 +430,11 @@ COPY { <replaceable class="parameter">table_name</replaceable> [ ( <replaceable <listitem> <para> Specify the amount of messages emitted by a <command>COPY</command> - command: <literal>default</literal> or <literal>verbose</literal>. If - <literal>verbose</literal> is specified, additional messages are emitted - during processing. + command: <literal>default</literal>, <literal>verbose</literal>, or + <literal>silent</literal>. + If <literal>verbose</literal> is specified, additional messages are + emitted during processing. + <literal>silent</literal> suppresses both verbose and default messages. </para> <para> This is currently used in <command>COPY FROM</command> command when diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index 3bb579a3a4..03eb7a4eba 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -427,9 +427,11 @@ defGetCopyLogVerbosityChoice(DefElem *def, ParseState *pstate) char *sval; /* - * Allow "default", or "verbose" values. + * Allow "silent", "default", or "verbose" values. */ sval = defGetString(def); + if (pg_strcasecmp(sval, "silent") == 0) + return COPY_LOG_VERBOSITY_SILENT; if (pg_strcasecmp(sval, "default") == 0) return COPY_LOG_VERBOSITY_DEFAULT; if (pg_strcasecmp(sval, "verbose") == 0) diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index 2d3462913e..47879994f7 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -1320,7 +1320,8 @@ CopyFrom(CopyFromState cstate) error_context_stack = errcallback.previous; if (cstate->opts.on_error != COPY_ON_ERROR_STOP && - cstate->num_errors > 0) + cstate->num_errors > 0 && + cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT) ereport(NOTICE, errmsg_plural("%llu row was skipped due to data type incompatibility", "%llu rows were skipped due to data type incompatibility", diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c index a7ccde6d7d..6530b0f1ce 100644 --- a/src/bin/psql/tab-complete.c +++ b/src/bin/psql/tab-complete.c @@ -2916,7 +2916,7 @@ psql_completion(const char *text, int start, int end) /* Complete COPY <sth> FROM filename WITH (LOG_VERBOSITY */ else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", "(", "LOG_VERBOSITY")) - COMPLETE_WITH("default", "verbose"); + COMPLETE_WITH("silent", "default", "verbose"); /* Complete COPY <sth> FROM <sth> WITH (<options>) */ else if (Matches("COPY|\\copy", MatchAny, "FROM", MatchAny, "WITH", MatchAny)) diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h index 141fd48dc1..6f64d97fdd 100644 --- a/src/include/commands/copy.h +++ b/src/include/commands/copy.h @@ -45,7 +45,9 @@ typedef enum CopyOnErrorChoice */ typedef enum CopyLogVerbosityChoice { - COPY_LOG_VERBOSITY_DEFAULT = 0, /* logs no additional messages, default */ + COPY_LOG_VERBOSITY_SILENT = -1, /* logs none */ + COPY_LOG_VERBOSITY_DEFAULT = 0, /* logs no additional messages. As this is + * the default, assign 0 */ COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */ } CopyLogVerbosityChoice; diff --git a/src/test/regress/expected/copy2.out b/src/test/regress/expected/copy2.out index 61a19cdc4c..4e752977b5 100644 --- a/src/test/regress/expected/copy2.out +++ b/src/test/regress/expected/copy2.out @@ -760,6 +760,7 @@ COPY check_ign_err2 FROM STDIN WITH (on_error ignore, log_verbosity verbose); NOTICE: skipping row due to data type incompatibility at line 2 for column "l": null input CONTEXT: COPY check_ign_err2 NOTICE: 1 row was skipped due to data type incompatibility +COPY check_ign_err2 FROM STDIN WITH (on_error ignore, log_verbosity silent); -- reset context choice \set SHOW_CONTEXT errors SELECT * FROM check_ign_err; @@ -774,7 +775,8 @@ SELECT * FROM check_ign_err2; n | m | k | l ---+-----+---+------- 1 | {1} | 1 | 'foo' -(1 row) + 3 | {3} | 3 | 'bar' +(2 rows) -- test datatype error that can't be handled as soft: should fail CREATE TABLE hard_err(foo widget); diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql index 8b14962194..fa6aa17344 100644 --- a/src/test/regress/sql/copy2.sql +++ b/src/test/regress/sql/copy2.sql @@ -533,6 +533,10 @@ COPY check_ign_err2 FROM STDIN WITH (on_error ignore, log_verbosity verbose); 1 {1} 1 'foo' 2 {2} 2 \N \. +COPY check_ign_err2 FROM STDIN WITH (on_error ignore, log_verbosity silent); +3 {3} 3 'bar' +4 {4} 4 \N +\. -- reset context choice \set SHOW_CONTEXT errors -- 2.45.2
From 1fb0d7a1d4af8961edb56e8d05400a118e9a584c Mon Sep 17 00:00:00 2001 From: Fujii Masao <fu...@postgresql.org> Date: Mon, 30 Sep 2024 23:05:26 +0900 Subject: [PATCH v6 2/3] file_fdw: Add on_error and log_verbosity options to file_fdw. In v17, the on_error and log_verbosity options were introduced for the COPY command. This commit extends support for these options to file_fdw. Setting on_error = 'ignore' for a file_fdw foreign table allows users to query it without errors, even when the input file contains malformed rows, by skipping the problematic rows. Both on_error and log_verbosity options apply to SELECT and ANALYZE operations on file_fdw foreign tables. Author: Atsushi Torikoshi Reviewed-by: Fujii Masao Discussion: https://postgr.es/m/ab59dad10490ea3734cf022b16c24...@oss.nttdata.com --- contrib/file_fdw/expected/file_fdw.out | 19 +++++++ contrib/file_fdw/file_fdw.c | 72 +++++++++++++++++++++++--- contrib/file_fdw/sql/file_fdw.sql | 7 +++ doc/src/sgml/file-fdw.sgml | 23 ++++++++ 4 files changed, 113 insertions(+), 8 deletions(-) diff --git a/contrib/file_fdw/expected/file_fdw.out b/contrib/file_fdw/expected/file_fdw.out index 86c148a86b..593fdc782e 100644 --- a/contrib/file_fdw/expected/file_fdw.out +++ b/contrib/file_fdw/expected/file_fdw.out @@ -206,6 +206,25 @@ SELECT * FROM agg_csv c JOIN agg_text t ON (t.a = c.a) ORDER BY c.a; SELECT * FROM agg_bad; -- ERROR ERROR: invalid input syntax for type real: "aaa" CONTEXT: COPY agg_bad, line 3, column b: "aaa" +-- on_error and log_verbosity tests +ALTER FOREIGN TABLE agg_bad OPTIONS (ADD on_error 'ignore'); +SELECT * FROM agg_bad; +NOTICE: 1 row was skipped due to data type incompatibility + a | b +-----+-------- + 100 | 99.097 + 42 | 324.78 +(2 rows) + +ALTER FOREIGN TABLE agg_bad OPTIONS (ADD log_verbosity 'silent'); +SELECT * FROM agg_bad; + a | b +-----+-------- + 100 | 99.097 + 42 | 324.78 +(2 rows) + +ANALYZE agg_bad; -- misc query tests \t on SELECT explain_filter('EXPLAIN (VERBOSE, COSTS FALSE) SELECT * FROM agg_csv'); diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c index d16821f8e1..1e28c20797 100644 --- a/contrib/file_fdw/file_fdw.c +++ b/contrib/file_fdw/file_fdw.c @@ -22,6 +22,7 @@ #include "catalog/pg_authid.h" #include "catalog/pg_foreign_table.h" #include "commands/copy.h" +#include "commands/copyfrom_internal.h" #include "commands/defrem.h" #include "commands/explain.h" #include "commands/vacuum.h" @@ -74,6 +75,8 @@ static const struct FileFdwOption valid_options[] = { {"null", ForeignTableRelationId}, {"default", ForeignTableRelationId}, {"encoding", ForeignTableRelationId}, + {"on_error", ForeignTableRelationId}, + {"log_verbosity", ForeignTableRelationId}, {"force_not_null", AttributeRelationId}, {"force_null", AttributeRelationId}, @@ -725,12 +728,12 @@ fileIterateForeignScan(ForeignScanState *node) ExprContext *econtext; MemoryContext oldcontext; TupleTableSlot *slot = node->ss.ss_ScanTupleSlot; - bool found; + CopyFromState cstate = festate->cstate; ErrorContextCallback errcallback; /* Set up callback to identify error line number. */ errcallback.callback = CopyFromErrorCallback; - errcallback.arg = (void *) festate->cstate; + errcallback.arg = (void *) cstate; errcallback.previous = error_context_stack; error_context_stack = &errcallback; @@ -751,10 +754,27 @@ fileIterateForeignScan(ForeignScanState *node) * switch in case we are doing that. */ oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate)); - found = NextCopyFrom(festate->cstate, econtext, - slot->tts_values, slot->tts_isnull); - if (found) + +retry: + if (NextCopyFrom(cstate, econtext, slot->tts_values, slot->tts_isnull)) + { + if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE && + cstate->escontext->error_occurred) + { + /* + * Soft error occurred, skip this tuple and just make + * ErrorSaveContext ready for the next NextCopyFrom. Since we + * don't set details_wanted and error_data is not to be filled, + * just resetting error_occurred is enough. + */ + cstate->escontext->error_occurred = false; + + /* Repeat NextCopyFrom() until no soft error occurs */ + goto retry; + } + ExecStoreVirtualTuple(slot); + } /* Switch back to original memory context */ MemoryContextSwitchTo(oldcontext); @@ -796,8 +816,19 @@ fileEndForeignScan(ForeignScanState *node) FileFdwExecutionState *festate = (FileFdwExecutionState *) node->fdw_state; /* if festate is NULL, we are in EXPLAIN; nothing to do */ - if (festate) - EndCopyFrom(festate->cstate); + if (!festate) + return; + + if (festate->cstate->opts.on_error == COPY_ON_ERROR_IGNORE && + festate->cstate->num_errors > 0 && + festate->cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT) + ereport(NOTICE, + errmsg_plural("%llu row was skipped due to data type incompatibility", + "%llu rows were skipped due to data type incompatibility", + (unsigned long long) festate->cstate->num_errors, + (unsigned long long) festate->cstate->num_errors)); + + EndCopyFrom(festate->cstate); } /* @@ -1113,7 +1144,8 @@ estimate_costs(PlannerInfo *root, RelOptInfo *baserel, * which must have at least targrows entries. * The actual number of rows selected is returned as the function result. * We also count the total number of rows in the file and return it into - * *totalrows. Note that *totaldeadrows is always set to 0. + * *totalrows. Rows skipped due to on_error = 'ignore' are not included + * in this count. Note that *totaldeadrows is always set to 0. * * Note that the returned list of rows is not always in order by physical * position in the file. Therefore, correlation estimates derived later @@ -1191,6 +1223,21 @@ file_acquire_sample_rows(Relation onerel, int elevel, if (!found) break; + if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE && + cstate->escontext->error_occurred) + { + /* + * Soft error occurred, skip this tuple and just make + * ErrorSaveContext ready for the next NextCopyFrom. Since we + * don't set details_wanted and error_data is not to be filled, + * just resetting error_occurred is enough. + */ + cstate->escontext->error_occurred = false; + + /* Repeat NextCopyFrom() until no soft error occurs */ + continue; + } + /* * The first targrows sample rows are simply copied into the * reservoir. Then we start replacing tuples in the sample until we @@ -1236,6 +1283,15 @@ file_acquire_sample_rows(Relation onerel, int elevel, /* Clean up. */ MemoryContextDelete(tupcontext); + if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE && + cstate->num_errors > 0 && + cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT) + ereport(NOTICE, + errmsg_plural("%llu row was skipped due to data type incompatibility", + "%llu rows were skipped due to data type incompatibility", + (unsigned long long) cstate->num_errors, + (unsigned long long) cstate->num_errors)); + EndCopyFrom(cstate); pfree(values); diff --git a/contrib/file_fdw/sql/file_fdw.sql b/contrib/file_fdw/sql/file_fdw.sql index f0548e14e1..edd77c5cd2 100644 --- a/contrib/file_fdw/sql/file_fdw.sql +++ b/contrib/file_fdw/sql/file_fdw.sql @@ -150,6 +150,13 @@ SELECT * FROM agg_csv c JOIN agg_text t ON (t.a = c.a) ORDER BY c.a; -- error context report tests SELECT * FROM agg_bad; -- ERROR +-- on_error and log_verbosity tests +ALTER FOREIGN TABLE agg_bad OPTIONS (ADD on_error 'ignore'); +SELECT * FROM agg_bad; +ALTER FOREIGN TABLE agg_bad OPTIONS (ADD log_verbosity 'silent'); +SELECT * FROM agg_bad; +ANALYZE agg_bad; + -- misc query tests \t on SELECT explain_filter('EXPLAIN (VERBOSE, COSTS FALSE) SELECT * FROM agg_csv'); diff --git a/doc/src/sgml/file-fdw.sgml b/doc/src/sgml/file-fdw.sgml index f2f2af9a59..bb3579b077 100644 --- a/doc/src/sgml/file-fdw.sgml +++ b/doc/src/sgml/file-fdw.sgml @@ -126,6 +126,29 @@ </listitem> </varlistentry> + <varlistentry> + <term><literal>on_error</literal></term> + + <listitem> + <para> + Specifies how to behave when encountering an error converting a column's + input value into its data type, + the same as <command>COPY</command>'s <literal>ON_ERROR</literal> option. + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><literal>log_verbosity</literal></term> + + <listitem> + <para> + Specifies the amount of messages emitted by <literal>file_fdw</literal>, + the same as <command>COPY</command>'s <literal>LOG_VERBOSITY</literal> option. + </para> + </listitem> + </varlistentry> + </variablelist> <para> -- 2.45.2
From 68663c230bbfc54e8bd730258e3a1a420eb0a92e Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi <torikos...@oss.nttdata.com> Date: Wed, 25 Sep 2024 21:30:26 +0900 Subject: [PATCH v6 3/3] Refactor CopyFrom() in copyfrom.c. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This commit simplifies CopyFrom() by removing the unnecessary local variable 'skipped', which tracked the number of rows skipped due to on_error = 'ignore'. That count is already handled by cstate->num_errors, so the 'skipped' variable was redundant. Additionally, the condition on_error != COPY_ON_ERROR_STOP is removed. Since on_error == COPY_ON_ERROR_IGNORE is already checked, and on_error only has two values (ignore and stop), the additional check was redundant and made the logic harder to read. Seemingly this was introduced in preparation for a future patch, but the current checks don’t offer clear value and have been removed to improve readability. Author: Atsushi Torikoshi Reviewed-by: Fujii Masao Discussion: https://postgr.es/m/ab59dad10490ea3734cf022b16c24...@oss.nttdata.com --- src/backend/commands/copyfrom.c | 21 ++++++++------------- 1 file changed, 8 insertions(+), 13 deletions(-) diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c index 47879994f7..9139a40785 100644 --- a/src/backend/commands/copyfrom.c +++ b/src/backend/commands/copyfrom.c @@ -657,7 +657,6 @@ CopyFrom(CopyFromState cstate) CopyMultiInsertInfo multiInsertInfo = {0}; /* pacify compiler */ int64 processed = 0; int64 excluded = 0; - int64 skipped = 0; bool has_before_insert_row_trig; bool has_instead_insert_row_trig; bool leafpart_use_multi_insert = false; @@ -1004,26 +1003,22 @@ CopyFrom(CopyFromState cstate) if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull)) break; - if (cstate->opts.on_error != COPY_ON_ERROR_STOP && + if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE && cstate->escontext->error_occurred) { /* - * Soft error occurred, skip this tuple and deal with error - * information according to ON_ERROR. + * Soft error occurred, skip this tuple and just make + * ErrorSaveContext ready for the next NextCopyFrom. Since we + * don't set details_wanted and error_data is not to be filled, + * just resetting error_occurred is enough. */ - if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE) - - /* - * Just make ErrorSaveContext ready for the next NextCopyFrom. - * Since we don't set details_wanted and error_data is not to - * be filled, just resetting error_occurred is enough. - */ - cstate->escontext->error_occurred = false; + cstate->escontext->error_occurred = false; /* Report that this tuple was skipped by the ON_ERROR clause */ pgstat_progress_update_param(PROGRESS_COPY_TUPLES_SKIPPED, - ++skipped); + cstate->num_errors); + /* Repeat NextCopyFrom() until no soft error occurs */ continue; } -- 2.45.2