Re: [HACKERS] GSOC'17 project introduction: Parallel COPY execution with errors handling

Alexander Korotkov Thu, 06 Apr 2017 06:48:55 -0700

Hi, Alexey!

On Tue, Mar 28, 2017 at 1:54 AM, Alexey Kondratov <
[email protected]> wrote:


> Thank you for your responses and valuable comments!
>
> I have written draft proposal https://docs.google.com/document/d/1Y4mc_
> PCvRTjLsae-_fhevYfepv4sxaqwhOo4rlxvK1c/edit
>
> It seems that COPY currently is able to return first error line and error
> type (extra or missing columns, type parse error, etc).
> Thus, the approach similar to the Stas wrote should work and, being
> optimised for a small number of error rows, should not
> affect COPY performance in such case.
>
> I will be glad to receive any critical remarks and suggestions.
>

I've following questions about your proposal.

1. Suppose we have to insert N records
> 2. We create subtransaction with these N records
> 3. Error is raised on k-th line
> 4. Then, we can safely insert all lines from 1st and till (k - 1)
>
5. Report, save to errors table or silently drop k-th line
> 6. Next, try to insert lines from (k + 1) till N with another
> subtransaction
> 7. Repeat until the end of file


Do you assume that we start new subtransaction in 4 since subtransaction we
started in 2 is rolled back?

I am planning to use background worker processes for parallel COPY
> execution. Each process will receive equal piece of the input file. Since
> file is splitted by size not by lines, each worker will start import from
> the first new line to do not hit a broken line.


I think that situation when backend is directly reading file during COPY is
not typical.  More typical case is \copy psql command.  In that case "COPY
... FROM stdin;" is actually executed while psql is streaming the data.
How can we apply parallel COPY in this case?

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] GSOC'17 project introduction: Parallel COPY execution with errors handling

Reply via email to