On 2024-02-12 15:46, Bharath Rupireddy wrote:

Thanks for your comments.

On Mon, Jan 29, 2024 at 8:41 AM torikoshia <torikos...@oss.nttdata.com> wrote:

On 2024-01-27 00:43, David G. Johnston wrote:

>> I'd like to have a new option "log", which skips soft errors and
>> logs
>> information that should have resulted in errors to PostgreSQL log.
>
> user-specified tables in the same database.  Setting up temporary
> tables or unlogged tables probably is going to be a more acceptable
> methodology than trying to get to the log files.

OTOH not everyone thinks saving table information is the best idea[2].

The added NOTICE message gives a summary of how many rows were
skipped, but not the line numbers. It's hard for the users to find the
malformed data, especially when they are bulk-inserting from data
files of multiple GBs in size (hard to open such files in any editor
too).

I understand the value of writing the info to server log or tables of
users' choice as being discussed in a couple of other threads.
However, I'd prefer we do the simplest thing first before we go debate
server log or table - let the users know what line numbers are
containing the errors that COPY ignored something like [1] with a
simple change like [2].

Agreed.
Unlike my patch, it hides the error information(i.e. 22P02: invalid input syntax for type integer: ), but I feel that it's usually sufficient to know the row number and column where the error occurred.

It not only helps users figure out which rows
and attributes were malformed, but also helps them redirect them to
server logs with setting log_min_messages = notice [3]. In the worst
case scenario, a problem with this one NOTICE per malformed row is
that it can overload the psql session if all the rows are malformed.
I'm not sure if this is a big problem, but IMO better than a single
summary NOTICE message and simpler than writing to tables of users'
choice.

Maybe could we do what you suggested for the behavior when 'log' is set to on_error?

Thoughts?

FWIW, I presented the new COPY ... ON_ERROR option feature at
Hyderabad PostgreSQL User Group meetup and it was well-received by the
audience. The audience felt it's going to be extremely helpful for
bulk-loading tasks. Thank you all for working on this feature.

Thanks for informing it!
I hope it will not be reverted as the audience:)

[1]
postgres=# CREATE TABLE check_ign_err (n int, m int[], k int);
CREATE TABLE
postgres=# COPY check_ign_err FROM STDIN WITH (on_error ignore);
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
1    {1}     1
a       {2}     2
3       {3}     3333333333
4       {a, 4}  4

5       {5}>> >> >> >> >>       5
\.>>
NOTICE:  detected data type incompatibility at line number 2 for
column check_ign_err, COPY n
NOTICE:  detected data type incompatibility at line number 3 for
column check_ign_err, COPY k
NOTICE:  detected data type incompatibility at line number 4 for
column check_ign_err, COPY m
NOTICE:  detected data type incompatibility at line number 5 for
column check_ign_err, COPY n
NOTICE:  4 rows were skipped due to data type incompatibility
COPY 2

[2]
diff --git a/src/backend/commands/copyfromparse.c
b/src/backend/commands/copyfromparse.c
index 906756362e..93ab5d4ebb 100644
--- a/src/backend/commands/copyfromparse.c
+++ b/src/backend/commands/copyfromparse.c
@@ -961,7 +961,16 @@ NextCopyFrom(CopyFromState cstate, ExprContext *econtext,

                 &values[m]))
                        {
                                cstate->num_errors++;
-                               return true;
+
+ if (cstate->opts.on_error != COPY_ON_ERROR_STOP)
+                               {
+                                       ereport(NOTICE,
+
errmsg("detected data type incompatibility at line number %llu for
column %s, COPY %s",
+
(unsigned long long) cstate->cur_lineno,
+
cstate->cur_relname,
+
cstate->cur_attname));
+                                       return true;
+                               }
                        }

                        cstate->cur_attname = NULL;

[3]
2024-02-12 06:20:29.363 UTC [427892] NOTICE:  detected data type
incompatibility at line number 2 for column check_ign_err, COPY n
2024-02-12 06:20:29.363 UTC [427892] CONTEXT:  COPY check_ign_err,
line 2, column n: "a"
2024-02-12 06:20:29.363 UTC [427892] NOTICE:  detected data type
incompatibility at line number 3 for column check_ign_err, COPY k
2024-02-12 06:20:29.363 UTC [427892] CONTEXT:  COPY check_ign_err,
line 3, column k: "3333333333"
2024-02-12 06:20:29.363 UTC [427892] NOTICE:  detected data type
incompatibility at line number 4 for column check_ign_err, COPY m
2024-02-12 06:20:29.363 UTC [427892] CONTEXT:  COPY check_ign_err,
line 4, column m: "{a, 4}"
2024-02-12 06:20:29.363 UTC [427892] NOTICE:  detected data type
incompatibility at line number 5 for column check_ign_err, COPY n
2024-02-12 06:20:29.363 UTC [427892] CONTEXT:  COPY check_ign_err,
line 5, column n: ""
2024-02-12 06:20:29.363 UTC [427892] NOTICE:  4 rows were skipped due
to data type incompatibility

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

--
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporation


Reply via email to