Re: [HACKERS] GSOC'17 project introduction: Parallel COPY execution with errors handling

2017-06-12 Thread Peter Geoghegan
On Mon, Jun 12, 2017 at 3:52 AM, Alexey Kondratov wrote: > I am not going to start with "speculative insertion" right now, but it would > be very > useful, if you give me a point, where to start. Maybe I will at least try to > evaluate > the complexity of the problem.

Re: [HACKERS] GSOC'17 project introduction: Parallel COPY execution with errors handling

2017-06-12 Thread Alexey Kondratov
Thank you for your comments Peter, there are some points that I did not think about before.On 9 Jun 2017, at 01:09, Peter Geoghegan wrote:Adding a full support of ON CONFLICT DO NOTHING/UPDATE to COPY seemsto be a large separated task and is out of the current project scope,

Re: [HACKERS] GSOC'17 project introduction: Parallel COPY execution with errors handling

2017-06-08 Thread Peter Geoghegan
On Wed, Jun 7, 2017 at 12:34 PM, Alex K wrote: > (1) One of my mentors--Alvaro Herrera--suggested me to have a look on the > UPSERT. > It may be a good point to be able to achieve the same functionality > as during the ON CONFLICT DO NOTHING, when COPY actually

Re: [HACKERS] GSOC'17 project introduction: Parallel COPY execution with errors handling

2017-06-07 Thread Alex K
Hi pgsql-hackers, Thank you again for all these replies. I have started working under this project and learnt a lot of new stuff last month, so here are some new thoughts about ERRORS handling in COPY. I decided to stick to the same thread, since it has a neutral subject. (1) One of my

Re: [HACKERS] GSOC'17 project introduction: Parallel COPY execution with errors handling

2017-04-12 Thread Craig Ringer
On 13 April 2017 at 01:57, Stas Kelvich wrote: > However I think it worth of quick research whether it is possible to create > special > code path for COPY in which errors don’t cancel transaction. Not really. Anything at any layer of the system expects to be able to

Re: [HACKERS] GSOC'17 project introduction: Parallel COPY execution with errors handling

2017-04-12 Thread Stas Kelvich
> On 12 Apr 2017, at 20:23, Robert Haas wrote: > > On Wed, Apr 12, 2017 at 1:18 PM, Nicolas Barbier > wrote: >> 2017-04-11 Robert Haas : >>> If the data quality is poor (say, 50% of lines have errors) it's >>> almost

Re: [HACKERS] GSOC'17 project introduction: Parallel COPY execution with errors handling

2017-04-12 Thread Robert Haas
On Wed, Apr 12, 2017 at 1:18 PM, Nicolas Barbier wrote: > 2017-04-11 Robert Haas : >> There's a nasty trade-off here between XID consumption (and the >> aggressive vacuums it eventually causes) and preserving performance in >> the face of errors -

Re: [HACKERS] GSOC'17 project introduction: Parallel COPY execution with errors handling

2017-04-12 Thread Nicolas Barbier
2017-04-11 Robert Haas : > There's a nasty trade-off here between XID consumption (and the > aggressive vacuums it eventually causes) and preserving performance in > the face of errors - e.g. if you make k = 100,000 you consume 100x > fewer XIDs than if you make k = 1000,

Re: [HACKERS] GSOC'17 project introduction: Parallel COPY execution with errors handling

2017-04-11 Thread Robert Haas
On Mon, Apr 10, 2017 at 2:46 PM, Alexey Kondratov wrote: > Yes, sure, I don't doubt it. The question was around step 4 in the following > possible algorithm: > > 1. Suppose we have to insert N records > 2. Start subtransaction with these N records > 3. Error is

Re: [HACKERS] GSOC'17 project introduction: Parallel COPY execution with errors handling

2017-04-10 Thread Alexey Kondratov
Yes, sure, I don't doubt it. The question was around step 4 in the following possible algorithm: 1. Suppose we have to insert N records 2. Start subtransaction with these N records 3. Error is raised on k-th line 4. Then, we know that we can safely insert all lines from the 1st till (k - 1) 5.

Re: [HACKERS] GSOC'17 project introduction: Parallel COPY execution with errors handling

2017-04-10 Thread Robert Haas
On Mon, Apr 10, 2017 at 11:39 AM, Alex K wrote: > (1) It seems that starting new subtransaction at step 4 is not necessary. We > can just gather all error lines in one pass and at the end of input start > the only one additional subtransaction with all safe-lines at

Re: [HACKERS] GSOC'17 project introduction: Parallel COPY execution with errors handling

2017-04-10 Thread Alex K
Hi Alexander! I've missed your reply, since proposal submission deadline have passed last Monday and I didn't check hackers mailing list too frequently. (1) It seems that starting new subtransaction at step 4 is not necessary. We can just gather all error lines in one pass and at the end of

Re: [HACKERS] GSOC'17 project introduction: Parallel COPY execution with errors handling

2017-04-06 Thread Alexander Korotkov
Hi, Alexey! On Tue, Mar 28, 2017 at 1:54 AM, Alexey Kondratov < kondratov.alek...@gmail.com> wrote: > Thank you for your responses and valuable comments! > > I have written draft proposal https://docs.google.com/document/d/1Y4mc_ > PCvRTjLsae-_fhevYfepv4sxaqwhOo4rlxvK1c/edit > > It seems that

Re: [HACKERS] GSOC'17 project introduction: Parallel COPY execution with errors handling

2017-03-27 Thread Alexey Kondratov
Pavel, Craig and Stas, Thank you for your responses and valuable comments! I have written draft proposal https://docs.google.com/document/d/1Y4mc_PCvRTjLsae-_fhevYfepv4sxaqwhOo4rlxvK1c/edit It seems that

Re: [HACKERS] GSOC'17 project introduction: Parallel COPY execution with errors handling

2017-03-23 Thread Stas Kelvich
> On 23 Mar 2017, at 15:53, Craig Ringer wrote: > > On 23 March 2017 at 19:33, Alexey Kondratov > wrote: > >> (1) Add errors handling to COPY as a minimum program > > Huge +1 if you can do it in an efficient way. > > I think the main

Re: [HACKERS] GSOC'17 project introduction: Parallel COPY execution with errors handling

2017-03-23 Thread Craig Ringer
On 23 March 2017 at 19:33, Alexey Kondratov wrote: > (1) Add errors handling to COPY as a minimum program Huge +1 if you can do it in an efficient way. I think the main barrier to doing so is that the naïve approach creates a subtransaction for every row, which is

Re: [HACKERS] GSOC'17 project introduction: Parallel COPY execution with errors handling

2017-03-23 Thread Pavel Stehule
>> 1) Is there anyone out of PG comunity who will be interested in such >> project and can be a menthor? >> 2) These two points have a general idea – to simplify work with a large >> amount of data from a different sources, but mybe it would be better to >> focus on the single task? >> > > I spent

Re: [HACKERS] GSOC'17 project introduction: Parallel COPY execution with errors handling

2017-03-23 Thread Pavel Stehule
Hi 2017-03-23 12:33 GMT+01:00 Alexey Kondratov : > Hi pgsql-hackers, > > I'm planning to apply to GSOC'17 and my proposal consists currently of two > parts: > > (1) Add errors handling to COPY as a minimum program > > Motivation: Using PG on the daily basis for years

[HACKERS] GSOC'17 project introduction: Parallel COPY execution with errors handling

2017-03-23 Thread Alexey Kondratov
Hi pgsql-hackers, I'm planning to apply to GSOC'17 and my proposal consists currently of two parts: (1) Add errors handling to COPY as a minimum program Motivation: Using PG on the daily basis for years I found that there are some cases when you need to load (e.g. for a further analytics) a