Re: Conflict handling for COPY FROM

Alexey Kondratov Tue, 02 Jul 2019 09:57:47 -0700

On 28.06.2019 16:12, Alvaro Herrera wrote:

On Wed, Feb 20, 2019 at 7:04 PM Andres Freund <and...@anarazel.de> wrote:

Or even just return it as a row. CopyBoth is relatively widely supported
these days.

i think generating warning about it also sufficiently meet its propose of
notifying user about skipped record with existing logging facility
and we use it for similar propose in other place too. The different
i see is the number of warning that can be generated

Warnings seem useless for this purpose.  I'm with Andres: returning rows
would make this a fine feature.  If the user wants the rows in a table
as Andrew suggests, she can use wrap the whole thing in an insert.

I agree with previous commentators that returning rows will make thisfeature more versatile. Though, having a possibility to simply skipconflicting/malformed rows is worth of doing from my perspective.However, pushing every single skipped row to the client as a separatedWARNING will be too much for a bulk import. So maybe just overall statsabout skipped rows number will be enough?

Also, I would prefer having an option to ignore all errors, e.g. withoption ERROR_LIMIT set to -1. Because it is rather difficult to estimatea number of future errors if you are playing with some badly structureddata, while always setting it to 100500k looks ugly.

Anyway, below are some issues with existing code after a brief review ofthe patch:

1) Calculation of processed rows isn't correct (I've checked). You do itin two places, and


-            processed++;
+            if (!cstate->error_limit)
+                processed++;

is never incremented if ERROR_LIMIT is specified and no errorsoccurred/no constraints exist, so the result will always be 0. However,if primary column with constraints exists, then processed is calculatedcorrectly, since another code path is used:


+                        if (specConflict)
+                        {
+                            ...
+                        }
+                        else
+                            processed++;

I would prefer this calculation in a single place (as it was beforepatch) for simplicity and in order to avoid such problems.

2) This ExecInsertIndexTuples call is only executed now if ERROR_LIMITis specified and was exceeded, which doesn't seem to be correct, does it?


-                        if (resultRelInfo->ri_NumIndices > 0)

+ if (resultRelInfo->ri_NumIndices > 0 &&cstate->error_limit == 0)

                             recheckIndexes = ExecInsertIndexTuples(myslot,


3) Trailing whitespaces added to error messages and tests for some reason:

+                    ereport(WARNING,
+                            (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),

+ errmsg("skipping \"%s\" --- missing datafor column \"%s\" ",


+                    ereport(ERROR,
+                            (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                             errmsg("missing data for column \"%s\" ",

-ERROR:  missing data for column "e"
+ERROR:  missing data for column "e"
 CONTEXT:  COPY x, line 1: "2000    230    23    23"

-ERROR:  missing data for column "e"
+ERROR:  missing data for column "e"
 CONTEXT:  COPY x, line 1: "2001    231    \N    \N"

Otherwise, the patch applies/compiles cleanly and regression tests arepassed.



Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company

Re: Conflict handling for COPY FROM

Reply via email to