Hi, Alexey! On Tue, Mar 28, 2017 at 1:54 AM, Alexey Kondratov < [email protected]> wrote:
> Thank you for your responses and valuable comments! > > I have written draft proposal https://docs.google.com/document/d/1Y4mc_ > PCvRTjLsae-_fhevYfepv4sxaqwhOo4rlxvK1c/edit > > It seems that COPY currently is able to return first error line and error > type (extra or missing columns, type parse error, etc). > Thus, the approach similar to the Stas wrote should work and, being > optimised for a small number of error rows, should not > affect COPY performance in such case. > > I will be glad to receive any critical remarks and suggestions. > I've following questions about your proposal. 1. Suppose we have to insert N records > 2. We create subtransaction with these N records > 3. Error is raised on k-th line > 4. Then, we can safely insert all lines from 1st and till (k - 1) > 5. Report, save to errors table or silently drop k-th line > 6. Next, try to insert lines from (k + 1) till N with another > subtransaction > 7. Repeat until the end of file Do you assume that we start new subtransaction in 4 since subtransaction we started in 2 is rolled back? I am planning to use background worker processes for parallel COPY > execution. Each process will receive equal piece of the input file. Since > file is splitted by size not by lines, each worker will start import from > the first new line to do not hit a broken line. I think that situation when backend is directly reading file during COPY is not typical. More typical case is \copy psql command. In that case "COPY ... FROM stdin;" is actually executed while psql is streaming the data. How can we apply parallel COPY in this case? ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
