On 28.06.2019 16:12, Alvaro Herrera wrote:
On Wed, Feb 20, 2019 at 7:04 PM Andres Freund <and...@anarazel.de> wrote:
Or even just return it as a row. CopyBoth is relatively widely supported
these days.
i think generating warning about it also sufficiently meet its propose of
notifying user about skipped record with existing logging facility
and we use it for similar propose in other place too. The different
i see is the number of warning that can be generated
Warnings seem useless for this purpose.  I'm with Andres: returning rows
would make this a fine feature.  If the user wants the rows in a table
as Andrew suggests, she can use wrap the whole thing in an insert.

I agree with previous commentators that returning rows will make this feature more versatile. Though, having a possibility to simply skip conflicting/malformed rows is worth of doing from my perspective. However, pushing every single skipped row to the client as a separated WARNING will be too much for a bulk import. So maybe just overall stats about skipped rows number will be enough?

Also, I would prefer having an option to ignore all errors, e.g. with option ERROR_LIMIT set to -1. Because it is rather difficult to estimate a number of future errors if you are playing with some badly structured data, while always setting it to 100500k looks ugly.

Anyway, below are some issues with existing code after a brief review of the patch:

1) Calculation of processed rows isn't correct (I've checked). You do it in two places, and

-            processed++;
+            if (!cstate->error_limit)
+                processed++;

is never incremented if ERROR_LIMIT is specified and no errors occurred/no constraints exist, so the result will always be 0. However, if primary column with constraints exists, then processed is calculated correctly, since another code path is used:

+                        if (specConflict)
+                        {
+                            ...
+                        }
+                        else
+                            processed++;

I would prefer this calculation in a single place (as it was before patch) for simplicity and in order to avoid such problems.


2) This ExecInsertIndexTuples call is only executed now if ERROR_LIMIT is specified and was exceeded, which doesn't seem to be correct, does it?

-                        if (resultRelInfo->ri_NumIndices > 0)
+                        if (resultRelInfo->ri_NumIndices > 0 && cstate->error_limit == 0)
                             recheckIndexes = ExecInsertIndexTuples(myslot,


3) Trailing whitespaces added to error messages and tests for some reason:

+                    ereport(WARNING,
+                            (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                             errmsg("skipping \"%s\" --- missing data for column \"%s\" ",

+                    ereport(ERROR,
+                            (errcode(ERRCODE_BAD_COPY_FILE_FORMAT),
+                             errmsg("missing data for column \"%s\" ",

-ERROR:  missing data for column "e"
+ERROR:  missing data for column "e"
 CONTEXT:  COPY x, line 1: "2000    230    23    23"

-ERROR:  missing data for column "e"
+ERROR:  missing data for column "e"
 CONTEXT:  COPY x, line 1: "2001    231    \N    \N"


Otherwise, the patch applies/compiles cleanly and regression tests are passed.


Regards

--
Alexey Kondratov

Postgres Professional https://www.postgrespro.com
Russian Postgres Company



Reply via email to