|Thank you for your comments Peter, there are some points that I did not think about before.|
I am not going to start with "speculative insertion" right now, but it would be very
useful, if you give me a point, where to start. Maybe I will at least try to evaluate
the complexity of the problem.
Initially I was thinking only about malformed rows, e.g. less or extra columns.
Honestly, I did not know that there are so many levels and ways where error
can occur. So currently (and especially after your comments) I prefer to focus
only on the following list of errors:
1) File format issues
a. Less columns than needed
b. Extra columns
2) I am doubt about type mismatch. It is possible to imagine a situation when,
e.g. some integers are exported as int, and some as "int", but I am not sure
that is is a common situation.
3) Some constraint violations, e.g. unique index.
First appeared to be easy achievable without subtransactions. I have created a
proof of concept version of copy, where the errors handling is turned on by default.
Please, see small patch attached (applicable to 76b11e8a43eca4612dfccfe7f3ebd293fb8a46ec)
or GUI version on GitHub https://github.com/ololobus/postgres/pull/1/files.
It throws warnings instead of errors for malformed lines with less/extra columns
and reports line number.
Second is probably achievable without subtransactions via the PG_TRY/PG_CATCH
around heap_form_tuple, since it is not yet inserted into the heap.
But third is questionable without subtransactions, since even if we check
constraints once, there maybe various before/after triggers which can modify
tuple, so it will not satisfy them. Corresponding comment inside copy.c states:
"Note that a BR trigger might modify tuple such that the partition constraint is
no satisfied, so we need to check in that case." Thus, there are maybe different
situations here, as I understand. However, it a point where "speculative insertion"
is able to help.
These three cases should cover most real-life scenarios.
Now, I have some doubts about it too. If there is an encoding problem,
it is probably about the whole file, not only a few rows.
Description: Binary data