On 2024/09/26 21:57, torikoshia wrote:
Updated the patches.

Thanks for updating the patches! I’ve made some changes based on your work, 
which are attached.
Barring any objections, I'm thinking to push these patches.

For patches 0001 and 0003, I ran pgindent and updated the commit message.

Regarding patch 0002:

- I updated the regression test to run ANALYZE on the file_fdw foreign table
  since the on_error option also affects the ANALYZE command. To ensure test
  stability, the test now runs ANALYZE with log_verbosity = 'silent'.

- I removed the code that updated the count of skipped rows for
  the pg_stat_progress_copy view. As far as I know, file_fdw doesn’t
  currently support tracking pg_stat_progress_copy.tuples_processed.
  Supporting only tuples_skipped seems inconsistent, so I suggest creating
  a separate patch to extend file_fdw to track both tuples_processed and
  tuples_skipped in this view.

- I refactored the for-loop handling on_error = 'ignore' in 
fileIterateForeignScan()
  by replacing it with a goto statement for improved readability.

- I modified file_fdw to log a NOTICE message about skipped rows at the end of
  ANALYZE if any rows are skipped due to the on_error = 'ignore' setting.

  Regarding the "file contains XXX rows" message reported by the ANALYZE VERBOSE
  command on the file_fdw foreign table, what number should be reflected in XXX,
  especially when some rows are skipped due to on_error = 'ignore'?
  Currently, the count only includes valid rows, excluding any skipped rows.
  I haven't modified this code yet. Should we instead count all rows
  (valid and erroneous) and report that total?

  I noticed the code for reporting the number of skipped rows due to
  on_error = 'ignore' appears in three places. I’m considering creating
  a common function for this reporting to eliminate redundancy but haven’t
  implemented it yet.

- I’ve updated the commit message and run pgindent.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
From 6bcf56dc0556b3e9ded7200229c05c69e9c4fd6a Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi <torikos...@oss.nttdata.com>
Date: Wed, 25 Sep 2024 21:28:15 +0900
Subject: [PATCH v6 1/3] Add log_verbosity = 'silent' support to COPY command.

Previously, when the on_error option was set to ignore, the COPY command
would always log NOTICE messages for input rows discarded due to
data type incompatibility. Users had no way to suppress these messages.

This commit introduces a new log_verbosity setting, 'silent',
which prevents the COPY command from emitting NOTICE messages
when on_error = 'ignore' is used, even if rows are discarded.
This feature is particularly useful when processing malformed files
frequently, where a flood of NOTICE messages can be undesirable.

For example, when frequently loading malformed files via the COPY command
or querying foreign tables using file_fdw (with an upcoming patch to
add on_error support for file_fdw), users may prefer to suppress
these messages to reduce log noise and improve clarity.

Author: Atsushi Torikoshi
Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/ab59dad10490ea3734cf022b16c24...@oss.nttdata.com
---
 doc/src/sgml/ref/copy.sgml          | 10 +++++++---
 src/backend/commands/copy.c         |  4 +++-
 src/backend/commands/copyfrom.c     |  3 ++-
 src/bin/psql/tab-complete.c         |  2 +-
 src/include/commands/copy.h         |  4 +++-
 src/test/regress/expected/copy2.out |  4 +++-
 src/test/regress/sql/copy2.sql      |  4 ++++
 7 files changed, 23 insertions(+), 8 deletions(-)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 1518af8a04..d87684a5be 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -407,6 +407,8 @@ COPY { <replaceable 
class="parameter">table_name</replaceable> [ ( <replaceable
       <literal>verbose</literal>, a <literal>NOTICE</literal> message
       containing the line of the input file and the column name whose input
       conversion has failed is emitted for each discarded row.
+      When it is set to <literal>silent</literal>, no message is emitted
+      regarding ignored rows.
      </para>
     </listitem>
    </varlistentry>
@@ -428,9 +430,11 @@ COPY { <replaceable 
class="parameter">table_name</replaceable> [ ( <replaceable
     <listitem>
      <para>
       Specify the amount of messages emitted by a <command>COPY</command>
-      command: <literal>default</literal> or <literal>verbose</literal>. If
-      <literal>verbose</literal> is specified, additional messages are emitted
-      during processing.
+      command: <literal>default</literal>, <literal>verbose</literal>, or
+      <literal>silent</literal>.
+      If <literal>verbose</literal> is specified, additional messages are
+      emitted during processing.
+      <literal>silent</literal> suppresses both verbose and default messages.
      </para>
      <para>
       This is currently used in <command>COPY FROM</command> command when
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 3bb579a3a4..03eb7a4eba 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -427,9 +427,11 @@ defGetCopyLogVerbosityChoice(DefElem *def, ParseState 
*pstate)
        char       *sval;
 
        /*
-        * Allow "default", or "verbose" values.
+        * Allow "silent", "default", or "verbose" values.
         */
        sval = defGetString(def);
+       if (pg_strcasecmp(sval, "silent") == 0)
+               return COPY_LOG_VERBOSITY_SILENT;
        if (pg_strcasecmp(sval, "default") == 0)
                return COPY_LOG_VERBOSITY_DEFAULT;
        if (pg_strcasecmp(sval, "verbose") == 0)
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 2d3462913e..47879994f7 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -1320,7 +1320,8 @@ CopyFrom(CopyFromState cstate)
        error_context_stack = errcallback.previous;
 
        if (cstate->opts.on_error != COPY_ON_ERROR_STOP &&
-               cstate->num_errors > 0)
+               cstate->num_errors > 0 &&
+               cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT)
                ereport(NOTICE,
                                errmsg_plural("%llu row was skipped due to data 
type incompatibility",
                                                          "%llu rows were 
skipped due to data type incompatibility",
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index a7ccde6d7d..6530b0f1ce 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -2916,7 +2916,7 @@ psql_completion(const char *text, int start, int end)
 
        /* Complete COPY <sth> FROM filename WITH (LOG_VERBOSITY */
        else if (Matches("COPY|\\copy", MatchAny, "FROM|TO", MatchAny, "WITH", 
"(", "LOG_VERBOSITY"))
-               COMPLETE_WITH("default", "verbose");
+               COMPLETE_WITH("silent", "default", "verbose");
 
        /* Complete COPY <sth> FROM <sth> WITH (<options>) */
        else if (Matches("COPY|\\copy", MatchAny, "FROM", MatchAny, "WITH", 
MatchAny))
diff --git a/src/include/commands/copy.h b/src/include/commands/copy.h
index 141fd48dc1..6f64d97fdd 100644
--- a/src/include/commands/copy.h
+++ b/src/include/commands/copy.h
@@ -45,7 +45,9 @@ typedef enum CopyOnErrorChoice
  */
 typedef enum CopyLogVerbosityChoice
 {
-       COPY_LOG_VERBOSITY_DEFAULT = 0, /* logs no additional messages, default 
*/
+       COPY_LOG_VERBOSITY_SILENT = -1, /* logs none */
+       COPY_LOG_VERBOSITY_DEFAULT = 0, /* logs no additional messages. As this 
is
+                                                                        * the 
default, assign 0 */
        COPY_LOG_VERBOSITY_VERBOSE, /* logs additional messages */
 } CopyLogVerbosityChoice;
 
diff --git a/src/test/regress/expected/copy2.out 
b/src/test/regress/expected/copy2.out
index 61a19cdc4c..4e752977b5 100644
--- a/src/test/regress/expected/copy2.out
+++ b/src/test/regress/expected/copy2.out
@@ -760,6 +760,7 @@ COPY check_ign_err2 FROM STDIN WITH (on_error ignore, 
log_verbosity verbose);
 NOTICE:  skipping row due to data type incompatibility at line 2 for column 
"l": null input
 CONTEXT:  COPY check_ign_err2
 NOTICE:  1 row was skipped due to data type incompatibility
+COPY check_ign_err2 FROM STDIN WITH (on_error ignore, log_verbosity silent);
 -- reset context choice
 \set SHOW_CONTEXT errors
 SELECT * FROM check_ign_err;
@@ -774,7 +775,8 @@ SELECT * FROM check_ign_err2;
  n |  m  | k |   l   
 ---+-----+---+-------
  1 | {1} | 1 | 'foo'
-(1 row)
+ 3 | {3} | 3 | 'bar'
+(2 rows)
 
 -- test datatype error that can't be handled as soft: should fail
 CREATE TABLE hard_err(foo widget);
diff --git a/src/test/regress/sql/copy2.sql b/src/test/regress/sql/copy2.sql
index 8b14962194..fa6aa17344 100644
--- a/src/test/regress/sql/copy2.sql
+++ b/src/test/regress/sql/copy2.sql
@@ -533,6 +533,10 @@ COPY check_ign_err2 FROM STDIN WITH (on_error ignore, 
log_verbosity verbose);
 1      {1}     1       'foo'
 2      {2}     2       \N
 \.
+COPY check_ign_err2 FROM STDIN WITH (on_error ignore, log_verbosity silent);
+3      {3}     3       'bar'
+4      {4}     4       \N
+\.
 
 -- reset context choice
 \set SHOW_CONTEXT errors
-- 
2.45.2

From 1fb0d7a1d4af8961edb56e8d05400a118e9a584c Mon Sep 17 00:00:00 2001
From: Fujii Masao <fu...@postgresql.org>
Date: Mon, 30 Sep 2024 23:05:26 +0900
Subject: [PATCH v6 2/3] file_fdw: Add on_error and log_verbosity options to
 file_fdw.

In v17, the on_error and log_verbosity options were introduced for
the COPY command. This commit extends support for these options
to file_fdw.

Setting on_error = 'ignore' for a file_fdw foreign table allows users
to query it without errors, even when the input file contains
malformed rows, by skipping the problematic rows.

Both on_error and log_verbosity options apply to SELECT and ANALYZE
operations on file_fdw foreign tables.

Author: Atsushi Torikoshi
Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/ab59dad10490ea3734cf022b16c24...@oss.nttdata.com
---
 contrib/file_fdw/expected/file_fdw.out | 19 +++++++
 contrib/file_fdw/file_fdw.c            | 72 +++++++++++++++++++++++---
 contrib/file_fdw/sql/file_fdw.sql      |  7 +++
 doc/src/sgml/file-fdw.sgml             | 23 ++++++++
 4 files changed, 113 insertions(+), 8 deletions(-)

diff --git a/contrib/file_fdw/expected/file_fdw.out 
b/contrib/file_fdw/expected/file_fdw.out
index 86c148a86b..593fdc782e 100644
--- a/contrib/file_fdw/expected/file_fdw.out
+++ b/contrib/file_fdw/expected/file_fdw.out
@@ -206,6 +206,25 @@ SELECT * FROM agg_csv c JOIN agg_text t ON (t.a = c.a) 
ORDER BY c.a;
 SELECT * FROM agg_bad;               -- ERROR
 ERROR:  invalid input syntax for type real: "aaa"
 CONTEXT:  COPY agg_bad, line 3, column b: "aaa"
+-- on_error and log_verbosity tests
+ALTER FOREIGN TABLE agg_bad OPTIONS (ADD on_error 'ignore');
+SELECT * FROM agg_bad;
+NOTICE:  1 row was skipped due to data type incompatibility
+  a  |   b    
+-----+--------
+ 100 | 99.097
+  42 | 324.78
+(2 rows)
+
+ALTER FOREIGN TABLE agg_bad OPTIONS (ADD log_verbosity 'silent');
+SELECT * FROM agg_bad;
+  a  |   b    
+-----+--------
+ 100 | 99.097
+  42 | 324.78
+(2 rows)
+
+ANALYZE agg_bad;
 -- misc query tests
 \t on
 SELECT explain_filter('EXPLAIN (VERBOSE, COSTS FALSE) SELECT * FROM agg_csv');
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index d16821f8e1..1e28c20797 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -22,6 +22,7 @@
 #include "catalog/pg_authid.h"
 #include "catalog/pg_foreign_table.h"
 #include "commands/copy.h"
+#include "commands/copyfrom_internal.h"
 #include "commands/defrem.h"
 #include "commands/explain.h"
 #include "commands/vacuum.h"
@@ -74,6 +75,8 @@ static const struct FileFdwOption valid_options[] = {
        {"null", ForeignTableRelationId},
        {"default", ForeignTableRelationId},
        {"encoding", ForeignTableRelationId},
+       {"on_error", ForeignTableRelationId},
+       {"log_verbosity", ForeignTableRelationId},
        {"force_not_null", AttributeRelationId},
        {"force_null", AttributeRelationId},
 
@@ -725,12 +728,12 @@ fileIterateForeignScan(ForeignScanState *node)
        ExprContext *econtext;
        MemoryContext oldcontext;
        TupleTableSlot *slot = node->ss.ss_ScanTupleSlot;
-       bool            found;
+       CopyFromState cstate = festate->cstate;
        ErrorContextCallback errcallback;
 
        /* Set up callback to identify error line number. */
        errcallback.callback = CopyFromErrorCallback;
-       errcallback.arg = (void *) festate->cstate;
+       errcallback.arg = (void *) cstate;
        errcallback.previous = error_context_stack;
        error_context_stack = &errcallback;
 
@@ -751,10 +754,27 @@ fileIterateForeignScan(ForeignScanState *node)
         * switch in case we are doing that.
         */
        oldcontext = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
-       found = NextCopyFrom(festate->cstate, econtext,
-                                                slot->tts_values, 
slot->tts_isnull);
-       if (found)
+
+retry:
+       if (NextCopyFrom(cstate, econtext, slot->tts_values, slot->tts_isnull))
+       {
+               if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE &&
+                       cstate->escontext->error_occurred)
+               {
+                       /*
+                        * Soft error occurred, skip this tuple and just make
+                        * ErrorSaveContext ready for the next NextCopyFrom. 
Since we
+                        * don't set details_wanted and error_data is not to be 
filled,
+                        * just resetting error_occurred is enough.
+                        */
+                       cstate->escontext->error_occurred = false;
+
+                       /* Repeat NextCopyFrom() until no soft error occurs */
+                       goto retry;
+               }
+
                ExecStoreVirtualTuple(slot);
+       }
 
        /* Switch back to original memory context */
        MemoryContextSwitchTo(oldcontext);
@@ -796,8 +816,19 @@ fileEndForeignScan(ForeignScanState *node)
        FileFdwExecutionState *festate = (FileFdwExecutionState *) 
node->fdw_state;
 
        /* if festate is NULL, we are in EXPLAIN; nothing to do */
-       if (festate)
-               EndCopyFrom(festate->cstate);
+       if (!festate)
+               return;
+
+       if (festate->cstate->opts.on_error == COPY_ON_ERROR_IGNORE &&
+               festate->cstate->num_errors > 0 &&
+               festate->cstate->opts.log_verbosity >= 
COPY_LOG_VERBOSITY_DEFAULT)
+               ereport(NOTICE,
+                               errmsg_plural("%llu row was skipped due to data 
type incompatibility",
+                                                         "%llu rows were 
skipped due to data type incompatibility",
+                                                         (unsigned long long) 
festate->cstate->num_errors,
+                                                         (unsigned long long) 
festate->cstate->num_errors));
+
+       EndCopyFrom(festate->cstate);
 }
 
 /*
@@ -1113,7 +1144,8 @@ estimate_costs(PlannerInfo *root, RelOptInfo *baserel,
  * which must have at least targrows entries.
  * The actual number of rows selected is returned as the function result.
  * We also count the total number of rows in the file and return it into
- * *totalrows.  Note that *totaldeadrows is always set to 0.
+ * *totalrows.  Rows skipped due to on_error = 'ignore' are not included
+ * in this count.  Note that *totaldeadrows is always set to 0.
  *
  * Note that the returned list of rows is not always in order by physical
  * position in the file.  Therefore, correlation estimates derived later
@@ -1191,6 +1223,21 @@ file_acquire_sample_rows(Relation onerel, int elevel,
                if (!found)
                        break;
 
+               if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE &&
+                       cstate->escontext->error_occurred)
+               {
+                       /*
+                        * Soft error occurred, skip this tuple and just make
+                        * ErrorSaveContext ready for the next NextCopyFrom. 
Since we
+                        * don't set details_wanted and error_data is not to be 
filled,
+                        * just resetting error_occurred is enough.
+                        */
+                       cstate->escontext->error_occurred = false;
+
+                       /* Repeat NextCopyFrom() until no soft error occurs */
+                       continue;
+               }
+
                /*
                 * The first targrows sample rows are simply copied into the
                 * reservoir.  Then we start replacing tuples in the sample 
until we
@@ -1236,6 +1283,15 @@ file_acquire_sample_rows(Relation onerel, int elevel,
        /* Clean up. */
        MemoryContextDelete(tupcontext);
 
+       if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE &&
+               cstate->num_errors > 0 &&
+               cstate->opts.log_verbosity >= COPY_LOG_VERBOSITY_DEFAULT)
+               ereport(NOTICE,
+                               errmsg_plural("%llu row was skipped due to data 
type incompatibility",
+                                                         "%llu rows were 
skipped due to data type incompatibility",
+                                                         (unsigned long long) 
cstate->num_errors,
+                                                         (unsigned long long) 
cstate->num_errors));
+
        EndCopyFrom(cstate);
 
        pfree(values);
diff --git a/contrib/file_fdw/sql/file_fdw.sql 
b/contrib/file_fdw/sql/file_fdw.sql
index f0548e14e1..edd77c5cd2 100644
--- a/contrib/file_fdw/sql/file_fdw.sql
+++ b/contrib/file_fdw/sql/file_fdw.sql
@@ -150,6 +150,13 @@ SELECT * FROM agg_csv c JOIN agg_text t ON (t.a = c.a) 
ORDER BY c.a;
 -- error context report tests
 SELECT * FROM agg_bad;               -- ERROR
 
+-- on_error and log_verbosity tests
+ALTER FOREIGN TABLE agg_bad OPTIONS (ADD on_error 'ignore');
+SELECT * FROM agg_bad;
+ALTER FOREIGN TABLE agg_bad OPTIONS (ADD log_verbosity 'silent');
+SELECT * FROM agg_bad;
+ANALYZE agg_bad;
+
 -- misc query tests
 \t on
 SELECT explain_filter('EXPLAIN (VERBOSE, COSTS FALSE) SELECT * FROM agg_csv');
diff --git a/doc/src/sgml/file-fdw.sgml b/doc/src/sgml/file-fdw.sgml
index f2f2af9a59..bb3579b077 100644
--- a/doc/src/sgml/file-fdw.sgml
+++ b/doc/src/sgml/file-fdw.sgml
@@ -126,6 +126,29 @@
    </listitem>
   </varlistentry>
 
+  <varlistentry>
+   <term><literal>on_error</literal></term>
+
+   <listitem>
+    <para>
+     Specifies how to behave when encountering an error converting a column's
+     input value into its data type,
+     the same as <command>COPY</command>'s <literal>ON_ERROR</literal> option.
+    </para>
+   </listitem>
+  </varlistentry>
+
+  <varlistentry>
+   <term><literal>log_verbosity</literal></term>
+
+   <listitem>
+    <para>
+     Specifies the amount of messages emitted by <literal>file_fdw</literal>,
+     the same as <command>COPY</command>'s <literal>LOG_VERBOSITY</literal> 
option.
+    </para>
+   </listitem>
+  </varlistentry>
+
  </variablelist>
 
  <para>
-- 
2.45.2

From 68663c230bbfc54e8bd730258e3a1a420eb0a92e Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi <torikos...@oss.nttdata.com>
Date: Wed, 25 Sep 2024 21:30:26 +0900
Subject: [PATCH v6 3/3] Refactor CopyFrom() in copyfrom.c.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

This commit simplifies CopyFrom() by removing the unnecessary local variable
'skipped', which tracked the number of rows skipped due to on_error = 'ignore'.
That count is already handled by cstate->num_errors, so the 'skipped' variable
was redundant.

Additionally, the condition on_error != COPY_ON_ERROR_STOP is removed.
Since on_error == COPY_ON_ERROR_IGNORE is already checked, and on_error
only has two values (ignore and stop), the additional check was redundant
and made the logic harder to read. Seemingly this was introduced
in preparation for a future patch, but the current checks don’t offer
clear value and have been removed to improve readability.

Author: Atsushi Torikoshi
Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/ab59dad10490ea3734cf022b16c24...@oss.nttdata.com
---
 src/backend/commands/copyfrom.c | 21 ++++++++-------------
 1 file changed, 8 insertions(+), 13 deletions(-)

diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 47879994f7..9139a40785 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -657,7 +657,6 @@ CopyFrom(CopyFromState cstate)
        CopyMultiInsertInfo multiInsertInfo = {0};      /* pacify compiler */
        int64           processed = 0;
        int64           excluded = 0;
-       int64           skipped = 0;
        bool            has_before_insert_row_trig;
        bool            has_instead_insert_row_trig;
        bool            leafpart_use_multi_insert = false;
@@ -1004,26 +1003,22 @@ CopyFrom(CopyFromState cstate)
                if (!NextCopyFrom(cstate, econtext, myslot->tts_values, 
myslot->tts_isnull))
                        break;
 
-               if (cstate->opts.on_error != COPY_ON_ERROR_STOP &&
+               if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE &&
                        cstate->escontext->error_occurred)
                {
                        /*
-                        * Soft error occurred, skip this tuple and deal with 
error
-                        * information according to ON_ERROR.
+                        * Soft error occurred, skip this tuple and just make
+                        * ErrorSaveContext ready for the next NextCopyFrom. 
Since we
+                        * don't set details_wanted and error_data is not to be 
filled,
+                        * just resetting error_occurred is enough.
                         */
-                       if (cstate->opts.on_error == COPY_ON_ERROR_IGNORE)
-
-                               /*
-                                * Just make ErrorSaveContext ready for the 
next NextCopyFrom.
-                                * Since we don't set details_wanted and 
error_data is not to
-                                * be filled, just resetting error_occurred is 
enough.
-                                */
-                               cstate->escontext->error_occurred = false;
+                       cstate->escontext->error_occurred = false;
 
                        /* Report that this tuple was skipped by the ON_ERROR 
clause */
                        
pgstat_progress_update_param(PROGRESS_COPY_TUPLES_SKIPPED,
-                                                                               
 ++skipped);
+                                                                               
 cstate->num_errors);
 
+                       /* Repeat NextCopyFrom() until no soft error occurs */
                        continue;
                }
 
-- 
2.45.2

Reply via email to