Re: Fast COPY FROM based on batch insert

2022-10-31 Thread Etsuro Fujita
On Fri, Oct 28, 2022 at 7:53 PM Andrey Lepikhov
 wrote:
> On 28/10/2022 16:12, Etsuro Fujita wrote:
> > I think there is another patch that improves performance of COPY FROM
> > for foreign tables using COPY FROM STDIN, but if Andrey (or anyone
> > else) want to work on it again, I think it would be better to create a
> > new CF entry for it (and start a new thread for it).  So I plan to
> > close this in the November CF unless they think otherwise.

> I studied performance of this code in comparison to bulk INSERTions.
> This patch seems to improve speed of insertion by about 20%. Also, this
> patch is very invasive. So, I don't have any plans to work on it now.

Ok, let's leave that for future work.  I closed this entry in the November CF.

Best regards,
Etsuro Fujita




Re: Fast COPY FROM based on batch insert

2022-10-28 Thread Andrey Lepikhov

On 28/10/2022 16:12, Etsuro Fujita wrote:

On Thu, Oct 13, 2022 at 6:58 PM Etsuro Fujita  wrote:

I have committed the patch after tweaking comments a little bit further.


I think there is another patch that improves performance of COPY FROM
for foreign tables using COPY FROM STDIN, but if Andrey (or anyone
else) want to work on it again, I think it would be better to create a
new CF entry for it (and start a new thread for it).  So I plan to
close this in the November CF unless they think otherwise.

Anyway, thanks for the patch, Andrey!  Thanks for reviewing, Ian and Zhihong!

Thanks,

I studied performance of this code in comparison to bulk INSERTions.
This patch seems to improve speed of insertion by about 20%. Also, this 
patch is very invasive. So, I don't have any plans to work on it now.


--
regards,
Andrey Lepikhov
Postgres Professional





Re: Fast COPY FROM based on batch insert

2022-10-28 Thread Etsuro Fujita
On Thu, Oct 13, 2022 at 6:58 PM Etsuro Fujita  wrote:
> I have committed the patch after tweaking comments a little bit further.

I think there is another patch that improves performance of COPY FROM
for foreign tables using COPY FROM STDIN, but if Andrey (or anyone
else) want to work on it again, I think it would be better to create a
new CF entry for it (and start a new thread for it).  So I plan to
close this in the November CF unless they think otherwise.

Anyway, thanks for the patch, Andrey!  Thanks for reviewing, Ian and Zhihong!

Best regards,
Etsuro Fujita




Re: Fast COPY FROM based on batch insert

2022-10-13 Thread Etsuro Fujita
On Thu, Oct 13, 2022 at 1:38 PM Andrey Lepikhov
 wrote:
> On 10/12/22 07:56, Etsuro Fujita wrote:
> > On Tue, Oct 11, 2022 at 3:06 PM Andrey Lepikhov
> >  wrote:
> >> I reviewed the patch one more time. Only one question: bistate and
> >> ri_FdwRoutine are strongly bounded. Maybe to add some assertion on
> >> (ri_FdwRoutine XOR bistate) ? Just to prevent possible errors in future.
> >
> > You mean the bistate member of CopyMultiInsertBuffer?
> Yes
> >
> > We do not use that member at all for foreign tables, so the patch
> > avoids initializing that member in CopyMultiInsertBufferInit() when
> > called for a foreign table.  So we have bistate = NULL for foreign
> > tables (and bistate != NULL for plain tables), as you mentioned above.
> > I think it is a good idea to add such assertions.  How about adding
> > them to CopyMultiInsertBufferFlush() and
> > CopyMultiInsertBufferCleanup() like the attached?  In the attached I
> > updated comments a bit further as well.
> Yes, quite enough.

I have committed the patch after tweaking comments a little bit further.

Best regards,
Etsuro Fujita




Re: Fast COPY FROM based on batch insert

2022-10-12 Thread Andrey Lepikhov

On 10/12/22 07:56, Etsuro Fujita wrote:

On Tue, Oct 11, 2022 at 3:06 PM Andrey Lepikhov
 wrote:

I reviewed the patch one more time. Only one question: bistate and
ri_FdwRoutine are strongly bounded. Maybe to add some assertion on
(ri_FdwRoutine XOR bistate) ? Just to prevent possible errors in future.


You mean the bistate member of CopyMultiInsertBuffer?

Yes


We do not use that member at all for foreign tables, so the patch
avoids initializing that member in CopyMultiInsertBufferInit() when
called for a foreign table.  So we have bistate = NULL for foreign
tables (and bistate != NULL for plain tables), as you mentioned above.
I think it is a good idea to add such assertions.  How about adding
them to CopyMultiInsertBufferFlush() and
CopyMultiInsertBufferCleanup() like the attached?  In the attached I
updated comments a bit further as well.

Yes, quite enough.

--
Regards
Andrey Lepikhov
Postgres Professional





Re: Fast COPY FROM based on batch insert

2022-10-11 Thread Etsuro Fujita
On Tue, Oct 11, 2022 at 3:06 PM Andrey Lepikhov
 wrote:
> I reviewed the patch one more time. Only one question: bistate and
> ri_FdwRoutine are strongly bounded. Maybe to add some assertion on
> (ri_FdwRoutine XOR bistate) ? Just to prevent possible errors in future.

You mean the bistate member of CopyMultiInsertBuffer?

We do not use that member at all for foreign tables, so the patch
avoids initializing that member in CopyMultiInsertBufferInit() when
called for a foreign table.  So we have bistate = NULL for foreign
tables (and bistate != NULL for plain tables), as you mentioned above.
I think it is a good idea to add such assertions.  How about adding
them to CopyMultiInsertBufferFlush() and
CopyMultiInsertBufferCleanup() like the attached?  In the attached I
updated comments a bit further as well.

Thanks for reviewing!

Best regards,
Etsuro Fujita


v4-0001-Implementation-of-a-Bulk-COPY-FROM-efujita-5.patch
Description: Binary data


Re: Fast COPY FROM based on batch insert

2022-10-11 Thread Andrey Lepikhov

On 10/7/22 11:18, Etsuro Fujita wrote:

On Tue, Sep 27, 2022 at 6:03 PM Etsuro Fujita  wrote:

I will review the patch a bit more, but I feel that it is
in good shape.


One thing I noticed is this bit added to CopyMultiInsertBufferFlush()
to run triggers on the foreign table.

+   /* Run AFTER ROW INSERT triggers */
+   if (resultRelInfo->ri_TrigDesc != NULL &&
+   (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+   {
+   Oid relid =
RelationGetRelid(resultRelInfo->ri_RelationDesc);
+
+   for (i = 0; i < inserted; i++)
+   {
+   TupleTableSlot *slot = rslots[i];
+
+   /*
+* AFTER ROW Triggers might reference the tableoid column,
+* so (re-)initialize tts_tableOid before evaluating them.
+*/
+   slot->tts_tableOid = relid;
+
+   ExecARInsertTriggers(estate, resultRelInfo,
+slot, NIL,
+cstate->transition_capture);
+   }
+   }

Since foreign tables cannot have transition tables, we have
trig_insert_new_table=false.  So I simplified the if test and added an
assertion ensuring trig_insert_new_table=false.  Attached is a new
version of the patch.  I tweaked some comments a bit as well.  I think
the patch is committable.  So I plan on committing it next week if
there are no objections.
I reviewed the patch one more time. Only one question: bistate and 
ri_FdwRoutine are strongly bounded. Maybe to add some assertion on 
(ri_FdwRoutine XOR bistate) ? Just to prevent possible errors in future.


--
Regards
Andrey Lepikhov
Postgres Professional





Re: Fast COPY FROM based on batch insert

2022-10-07 Thread Etsuro Fujita
On Tue, Sep 27, 2022 at 6:03 PM Etsuro Fujita  wrote:
> I will review the patch a bit more, but I feel that it is
> in good shape.

One thing I noticed is this bit added to CopyMultiInsertBufferFlush()
to run triggers on the foreign table.

+   /* Run AFTER ROW INSERT triggers */
+   if (resultRelInfo->ri_TrigDesc != NULL &&
+   (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
+resultRelInfo->ri_TrigDesc->trig_insert_new_table))
+   {
+   Oid relid =
RelationGetRelid(resultRelInfo->ri_RelationDesc);
+
+   for (i = 0; i < inserted; i++)
+   {
+   TupleTableSlot *slot = rslots[i];
+
+   /*
+* AFTER ROW Triggers might reference the tableoid column,
+* so (re-)initialize tts_tableOid before evaluating them.
+*/
+   slot->tts_tableOid = relid;
+
+   ExecARInsertTriggers(estate, resultRelInfo,
+slot, NIL,
+cstate->transition_capture);
+   }
+   }

Since foreign tables cannot have transition tables, we have
trig_insert_new_table=false.  So I simplified the if test and added an
assertion ensuring trig_insert_new_table=false.  Attached is a new
version of the patch.  I tweaked some comments a bit as well.  I think
the patch is committable.  So I plan on committing it next week if
there are no objections.

Best regards,
Etsuro Fujita


v4-0001-Implementation-of-a-Bulk-COPY-FROM-efujita-4.patch
Description: Binary data


Re: Fast COPY FROM based on batch insert

2022-09-27 Thread Etsuro Fujita
On Wed, Aug 10, 2022 at 5:30 PM Etsuro Fujita  wrote:
> On Wed, Aug 10, 2022 at 1:06 AM Zhihong Yu  wrote:
> > +   /* If any rows were inserted, run AFTER ROW INSERT triggers. */
> > ...
> > +   for (i = 0; i < inserted; i++)
> > +   {
> > +   TupleTableSlot *slot = rslots[i];
> > ...
> > +   slot->tts_tableOid =
> > +   RelationGetRelid(resultRelInfo->ri_RelationDesc);
> >
> > It seems the return value of 
> > `RelationGetRelid(resultRelInfo->ri_RelationDesc)` can be stored in a 
> > variable outside the for loop.
> > Inside the for loop, assign this variable to slot->tts_tableOid.
>
> Actually, I did this to match the code in ExecBatchInsert(), but that
> seems like a good idea, so I’ll update the patch as such in the next
> version.

Done.  I also adjusted the code in CopyMultiInsertBufferFlush() a bit
further.  No functional changes.  I put back in the original position
an assertion ensuring the FDW supports batching.  Sorry for the back
and forth.  Attached is an updated version of the patch.

Other changes are:

* The previous patch modified postgres_fdw.sql so that the existing
test cases for COPY FROM were tested in batch-insert mode.  But I
think we should keep them as-is to test the default behavior, so I
added test cases for this feature by copying-and-pasting some of the
existing test cases.  Also, the previous patch added this:

+create table foo (a int) partition by list (a);
+create table foo1 (like foo);
+create foreign table ffoo1 partition of foo for values in (1)
+   server loopback options (table_name 'foo1');
+create table foo2 (like foo);
+create foreign table ffoo2 partition of foo for values in (2)
+   server loopback options (table_name 'foo2');
+create function print_new_row() returns trigger language plpgsql as $$
+   begin raise notice '%', new; return new; end; $$;
+create trigger ffoo1_br_trig before insert on ffoo1
+   for each row execute function print_new_row();
+
+copy foo from stdin;
+1
+2
+\.

Rather than doing so, I think it would be better to use a partitioned
table defined in the above section “test tuple routing for
foreign-table partitions”, to save cycles.  So I changed this as such.

* I modified comments a bit further and updated docs.

That is it.  I will review the patch a bit more, but I feel that it is
in good shape.

Best regards,
Etsuro Fujita


v4-0001-Implementation-of-a-Bulk-COPY-FROM-efujita-3.patch
Description: Binary data


Re: Fast COPY FROM based on batch insert

2022-09-27 Thread Etsuro Fujita
On Tue, Aug 23, 2022 at 2:58 PM Andrey Lepikhov
 wrote:
> On 22/8/2022 11:44, Etsuro Fujita wrote:
> > I think the latter is more consistent with the existing error context
> > information when in CopyMultiInsertBufferFlush().  Actually, I thought
> > this too, and I think this would be useful when the COPY FROM command
> > is executed on a foreign table.  My concern, however, is the case when
> > the command is executed on a partitioned table containing foreign
> > partitions; in that case the input data would not always be sorted in
> > the partition order, so the range for an error-occurring foreign
> > partition might contain many lines with rows from other partitions,
> > which I think makes the range information less useful.  Maybe I'm too
> > worried about that, though.

> I got your point. Indeed, perharps such info doesn't really needed to be
> included into the core, at least for now.

Ok.  Sorry for the late response.

Best regards,
Etsuro Fujita




Re: Fast COPY FROM based on batch insert

2022-08-22 Thread Andrey Lepikhov

On 22/8/2022 11:44, Etsuro Fujita wrote:

I think the latter is more consistent with the existing error context
information when in CopyMultiInsertBufferFlush().  Actually, I thought
this too, and I think this would be useful when the COPY FROM command
is executed on a foreign table.  My concern, however, is the case when
the command is executed on a partitioned table containing foreign
partitions; in that case the input data would not always be sorted in
the partition order, so the range for an error-occurring foreign
partition might contain many lines with rows from other partitions,
which I think makes the range information less useful.  Maybe I'm too
worried about that, though.
I got your point. Indeed, perharps such info doesn't really needed to be 
included into the core, at least for now.


--
regards,
Andrey Lepikhov
Postgres Professional




Re: Fast COPY FROM based on batch insert

2022-08-22 Thread Etsuro Fujita
On Mon, Aug 15, 2022 at 2:29 PM Andrey Lepikhov
 wrote:
> On 8/9/22 16:44, Etsuro Fujita wrote:
> >>> -1 foo
> >>> 1 bar
> >>> \.
> > ERROR:  new row for relation "t1" violates check constraint "t1_f1positive"
> > DETAIL:  Failing row contains (-1, foo).
> > CONTEXT:  remote SQL command: INSERT INTO public.t1(f1, f2) VALUES
> > ($1, $2), ($3, $4)
> > COPY ft1, line 3
> >
> > In single-insert mode the error context information is correct, but in
> > batch-insert mode it isn’t (i.e., the line number isn’t correct).
> >
> > The error occurs on the remote side, so I'm not sure if there is a
> > simple fix.  What I came up with is to just suppress error context
> > information other than the relation name, like the attached.  What do
> > you think about that?

> I've spent many efforts to this problem too. Your solution have a
> rationale and looks fine.
> I only think, we should add a bit of info into an error report to
> simplify comprehension why don't point specific line here. For example:
> 'COPY %s (buffered)'
> or
> 'COPY FOREIGN TABLE %s'
>
> or, if instead of relname_only field to save a MultiInsertBuffer
> pointer, we might add min/max linenos into the report:
> 'COPY %s, line between %llu and %llu'

I think the latter is more consistent with the existing error context
information when in CopyMultiInsertBufferFlush().  Actually, I thought
this too, and I think this would be useful when the COPY FROM command
is executed on a foreign table.  My concern, however, is the case when
the command is executed on a partitioned table containing foreign
partitions; in that case the input data would not always be sorted in
the partition order, so the range for an error-occurring foreign
partition might contain many lines with rows from other partitions,
which I think makes the range information less useful.  Maybe I'm too
worried about that, though.

Best regards,
Etsuro Fujita




Re: Fast COPY FROM based on batch insert

2022-08-14 Thread Andrey Lepikhov

On 8/9/22 16:44, Etsuro Fujita wrote:

-1 foo
1 bar
\.

ERROR:  new row for relation "t1" violates check constraint "t1_f1positive"
DETAIL:  Failing row contains (-1, foo).
CONTEXT:  remote SQL command: INSERT INTO public.t1(f1, f2) VALUES
($1, $2), ($3, $4)
COPY ft1, line 3

In single-insert mode the error context information is correct, but in
batch-insert mode it isn’t (i.e., the line number isn’t correct).

The error occurs on the remote side, so I'm not sure if there is a
simple fix.  What I came up with is to just suppress error context
information other than the relation name, like the attached.  What do
you think about that?
I've spent many efforts to this problem too. Your solution have a 
rationale and looks fine.
I only think, we should add a bit of info into an error report to 
simplify comprehension why don't point specific line here. For example:

'COPY %s (buffered)'
or
'COPY FOREIGN TABLE %s'

or, if instead of relname_only field to save a MultiInsertBuffer 
pointer, we might add min/max linenos into the report:

'COPY %s, line between %llu and %llu'

--
Regards
Andrey Lepikhov
Postgres Professional




Re: Fast COPY FROM based on batch insert

2022-08-10 Thread Etsuro Fujita
Hi,

On Wed, Aug 10, 2022 at 1:06 AM Zhihong Yu  wrote:
> On Tue, Aug 9, 2022 at 4:45 AM Etsuro Fujita  wrote:
>> * When running AFTER ROW triggers in CopyMultiInsertBufferFlush(), the
>> patch uses the slots passed to ExecForeignBatchInsert(), not the ones
>> returned by the callback function, but I don't think that that is
>> always correct, as the documentation about the callback function says:
>>
>>  The return value is an array of slots containing the data that was
>>  actually inserted (this might differ from the data supplied, for
>>  example as a result of trigger actions.)
>>  The passed-in slots can be re-used for this purpose.
>>
>> postgres_fdw re-uses the passed-in slots, but other FDWs might not, so
>> I modified the patch to reference the returned slots when running the
>> AFTER ROW triggers.

I noticed that my explanation was not correct.  Let me explain.
Before commit 82593b9a3, when batching into a view referencing a
postgres_fdw foreign table that has WCO constraints, postgres_fdw used
the passed-in slots to store the first tuple that was actually
inserted to the remote table.  But that commit disabled batching in
that case, so postgres_fdw wouldn’t use the passed-in slots (until we
support batching when there are WCO constraints from the parent views
and/or AFTER ROW triggers on the foreign table).

> +   /* If any rows were inserted, run AFTER ROW INSERT triggers. */
> ...
> +   for (i = 0; i < inserted; i++)
> +   {
> +   TupleTableSlot *slot = rslots[i];
> ...
> +   slot->tts_tableOid =
> +   RelationGetRelid(resultRelInfo->ri_RelationDesc);
>
> It seems the return value of 
> `RelationGetRelid(resultRelInfo->ri_RelationDesc)` can be stored in a 
> variable outside the for loop.
> Inside the for loop, assign this variable to slot->tts_tableOid.

Actually, I did this to match the code in ExecBatchInsert(), but that
seems like a good idea, so I’ll update the patch as such in the next
version.

Thanks for reviewing!

Best regards,
Etsuro Fujita




Re: Fast COPY FROM based on batch insert

2022-08-09 Thread Zhihong Yu
On Tue, Aug 9, 2022 at 4:45 AM Etsuro Fujita 
wrote:

> On Tue, Jul 26, 2022 at 7:19 PM Etsuro Fujita 
> wrote:
> > Yeah, I think the patch is getting better, but I noticed some issues,
> > so I'm working on them.  I think I can post a new version in the next
> > few days.
>
> * When running AFTER ROW triggers in CopyMultiInsertBufferFlush(), the
> patch uses the slots passed to ExecForeignBatchInsert(), not the ones
> returned by the callback function, but I don't think that that is
> always correct, as the documentation about the callback function says:
>
>  The return value is an array of slots containing the data that was
>  actually inserted (this might differ from the data supplied, for
>  example as a result of trigger actions.)
>  The passed-in slots can be re-used for this
> purpose.
>
> postgres_fdw re-uses the passed-in slots, but other FDWs might not, so
> I modified the patch to reference the returned slots when running the
> AFTER ROW triggers.  I also modified the patch to initialize the
> tts_tableOid.  Attached is an updated patch, in which I made some
> minor adjustments to CopyMultiInsertBufferFlush() as well.
>
> * The patch produces incorrect error context information:
>
> create extension postgres_fdw;
> create server loopback foreign data wrapper postgres_fdw options
> (dbname 'postgres');
> create user mapping for current_user server loopback;
> create table t1 (f1 int, f2 text);
> create foreign table ft1 (f1 int, f2 text) server loopback options
> (table_name 't1');
> alter table t1 add constraint t1_f1positive check (f1 >= 0);
> alter foreign table ft1 add constraint ft1_f1positive check (f1 >= 0);
>
> — single insert
> copy ft1 from stdin;
> Enter data to be copied followed by a newline.
> End with a backslash and a period on a line by itself, or an EOF signal.
> >> -1 foo
> >> 1 bar
> >> \.
> ERROR:  new row for relation "t1" violates check constraint "t1_f1positive"
> DETAIL:  Failing row contains (-1, foo).
> CONTEXT:  remote SQL command: INSERT INTO public.t1(f1, f2) VALUES ($1, $2)
> COPY ft1, line 1: "-1 foo"
>
> — batch insert
> alter server loopback options (add batch_size '2');
> copy ft1 from stdin;
> Enter data to be copied followed by a newline.
> End with a backslash and a period on a line by itself, or an EOF signal.
> >> -1 foo
> >> 1 bar
> >> \.
> ERROR:  new row for relation "t1" violates check constraint "t1_f1positive"
> DETAIL:  Failing row contains (-1, foo).
> CONTEXT:  remote SQL command: INSERT INTO public.t1(f1, f2) VALUES
> ($1, $2), ($3, $4)
> COPY ft1, line 3
>
> In single-insert mode the error context information is correct, but in
> batch-insert mode it isn’t (i.e., the line number isn’t correct).
>
> The error occurs on the remote side, so I'm not sure if there is a
> simple fix.  What I came up with is to just suppress error context
> information other than the relation name, like the attached.  What do
> you think about that?
>
> (In CopyMultiInsertBufferFlush() your patch sets cstate->cur_lineno to
> buffer->linenos[i] even when running AFTER ROW triggers for the i-th
> row returned by ExecForeignBatchInsert(), but that wouldn’t always be
> correct, as the i-th returned row might not correspond to the i-th row
> originally stored in the buffer as the callback function returns only
> the rows that were actually inserted on the remote side.  I think the
> proposed fix would address this issue as well.)
>
> * The patch produces incorrect row count in cases where some/all of
> the rows passed to ExecForeignBatchInsert() weren’t inserted on the
> remote side:
>
> create function trig_null() returns trigger as $$ begin return NULL;
> end $$ language plpgsql;
> create trigger trig_null before insert on t1 for each row execute
> function trig_null();
>
> — single insert
> alter server loopback options (drop batch_size);
> copy ft1 from stdin;
> Enter data to be copied followed by a newline.
> End with a backslash and a period on a line by itself, or an EOF signal.
> >> 0 foo
> >> 1 bar
> >> \.
> COPY 0
>
> — batch insert
> alter server loopback options (add batch_size '2');
> copy ft1 from stdin;
> Enter data to be copied followed by a newline.
> End with a backslash and a period on a line by itself, or an EOF signal.
> >> 0foo
> >> 1 bar
> >> \.
> COPY 2
>
> The row count is correct in single-insert mode, but isn’t in batch-insert
> mode.
>
> The reason is that in batch-insert mode the row counter is updated
> immediately after adding the row to the buffer, not after doing
> ExecForeignBatchInsert(), which might ignore the row.  To fix, I
> modified the patch to delay updating the row counter (and the progress
> of the COPY command) until after doing the callback function.  For
> consistency, I also modified the patch to delay it even when batching
> into plain tables.  IMO I think that that would be more consistent
> with the single-insert mode, as in that mode we update them after
> writing the tuple out to the table or sending 

Re: Fast COPY FROM based on batch insert

2022-08-09 Thread Etsuro Fujita
On Tue, Jul 26, 2022 at 7:19 PM Etsuro Fujita  wrote:
> Yeah, I think the patch is getting better, but I noticed some issues,
> so I'm working on them.  I think I can post a new version in the next
> few days.

* When running AFTER ROW triggers in CopyMultiInsertBufferFlush(), the
patch uses the slots passed to ExecForeignBatchInsert(), not the ones
returned by the callback function, but I don't think that that is
always correct, as the documentation about the callback function says:

 The return value is an array of slots containing the data that was
 actually inserted (this might differ from the data supplied, for
 example as a result of trigger actions.)
 The passed-in slots can be re-used for this purpose.

postgres_fdw re-uses the passed-in slots, but other FDWs might not, so
I modified the patch to reference the returned slots when running the
AFTER ROW triggers.  I also modified the patch to initialize the
tts_tableOid.  Attached is an updated patch, in which I made some
minor adjustments to CopyMultiInsertBufferFlush() as well.

* The patch produces incorrect error context information:

create extension postgres_fdw;
create server loopback foreign data wrapper postgres_fdw options
(dbname 'postgres');
create user mapping for current_user server loopback;
create table t1 (f1 int, f2 text);
create foreign table ft1 (f1 int, f2 text) server loopback options
(table_name 't1');
alter table t1 add constraint t1_f1positive check (f1 >= 0);
alter foreign table ft1 add constraint ft1_f1positive check (f1 >= 0);

— single insert
copy ft1 from stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> -1 foo
>> 1 bar
>> \.
ERROR:  new row for relation "t1" violates check constraint "t1_f1positive"
DETAIL:  Failing row contains (-1, foo).
CONTEXT:  remote SQL command: INSERT INTO public.t1(f1, f2) VALUES ($1, $2)
COPY ft1, line 1: "-1 foo"

— batch insert
alter server loopback options (add batch_size '2');
copy ft1 from stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> -1 foo
>> 1 bar
>> \.
ERROR:  new row for relation "t1" violates check constraint "t1_f1positive"
DETAIL:  Failing row contains (-1, foo).
CONTEXT:  remote SQL command: INSERT INTO public.t1(f1, f2) VALUES
($1, $2), ($3, $4)
COPY ft1, line 3

In single-insert mode the error context information is correct, but in
batch-insert mode it isn’t (i.e., the line number isn’t correct).

The error occurs on the remote side, so I'm not sure if there is a
simple fix.  What I came up with is to just suppress error context
information other than the relation name, like the attached.  What do
you think about that?

(In CopyMultiInsertBufferFlush() your patch sets cstate->cur_lineno to
buffer->linenos[i] even when running AFTER ROW triggers for the i-th
row returned by ExecForeignBatchInsert(), but that wouldn’t always be
correct, as the i-th returned row might not correspond to the i-th row
originally stored in the buffer as the callback function returns only
the rows that were actually inserted on the remote side.  I think the
proposed fix would address this issue as well.)

* The patch produces incorrect row count in cases where some/all of
the rows passed to ExecForeignBatchInsert() weren’t inserted on the
remote side:

create function trig_null() returns trigger as $$ begin return NULL;
end $$ language plpgsql;
create trigger trig_null before insert on t1 for each row execute
function trig_null();

— single insert
alter server loopback options (drop batch_size);
copy ft1 from stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> 0 foo
>> 1 bar
>> \.
COPY 0

— batch insert
alter server loopback options (add batch_size '2');
copy ft1 from stdin;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
>> 0foo
>> 1 bar
>> \.
COPY 2

The row count is correct in single-insert mode, but isn’t in batch-insert mode.

The reason is that in batch-insert mode the row counter is updated
immediately after adding the row to the buffer, not after doing
ExecForeignBatchInsert(), which might ignore the row.  To fix, I
modified the patch to delay updating the row counter (and the progress
of the COPY command) until after doing the callback function.  For
consistency, I also modified the patch to delay it even when batching
into plain tables.  IMO I think that that would be more consistent
with the single-insert mode, as in that mode we update them after
writing the tuple out to the table or sending it to the remote side.

* I modified the patch so that when batching into foreign tables we
skip useless steps in CopyMultiInsertBufferInit() and
CopyMultiInsertBufferCleanup().

That’s all I have for now.  Sorry for the delay.

Best regards,
Etsuro Fujita



Re: Fast COPY FROM based on batch insert

2022-07-31 Thread Etsuro Fujita
On Wed, Jul 27, 2022 at 2:42 PM Andrey Lepikhov
 wrote:
> On 7/22/22 13:14, Etsuro Fujita wrote:
> > On Fri, Jul 22, 2022 at 3:39 PM Andrey Lepikhov
> >  wrote:
> >> Analyzing multi-level heterogeneous partitioned configurations I
> >> realized, that single write into a partition with a trigger will flush
> >> buffers for all other partitions of the parent table even if the parent
> >> haven't any triggers.
> >> It relates to the code:
> >> else if (insertMethod == CIM_MULTI_CONDITIONAL &&
> >> !CopyMultiInsertInfoIsEmpty())
> >> {
> >> /*
> >>  * Flush pending inserts if this partition can't use
> >>  * batching, so rows are visible to triggers etc.
> >>  */
> >> CopyMultiInsertInfoFlush(, resultRelInfo);
> >> }
> >>
> >> Why such cascade flush is really necessary, especially for BEFORE and
> >> INSTEAD OF triggers?
> >
> > BEFORE triggers on the chosen partition might query the parent table,
> > not just the partition, so I think we need to do this so that such
> > triggers can see all the rows that have been inserted into the parent
> > table until then.
> if you'll excuse me, I will add one more argument.
> It wasn't clear, so I've made an experiment: result of a SELECT in an
> INSERT trigger function shows only data, existed in the parent table
> before the start of COPY.

Is the trigger function declared VOLATILE?  If so, the trigger should
see modifications to the parent table as well.  See:

https://www.postgresql.org/docs/15/trigger-datachanges.html

Best regards,
Etsuro Fujita




Re: Fast COPY FROM based on batch insert

2022-07-26 Thread Andrey Lepikhov

On 7/22/22 13:14, Etsuro Fujita wrote:

On Fri, Jul 22, 2022 at 3:39 PM Andrey Lepikhov
 wrote:

Analyzing multi-level heterogeneous partitioned configurations I
realized, that single write into a partition with a trigger will flush
buffers for all other partitions of the parent table even if the parent
haven't any triggers.
It relates to the code:
else if (insertMethod == CIM_MULTI_CONDITIONAL &&
!CopyMultiInsertInfoIsEmpty())
{
/*
 * Flush pending inserts if this partition can't use
 * batching, so rows are visible to triggers etc.
 */
CopyMultiInsertInfoFlush(, resultRelInfo);
}

Why such cascade flush is really necessary, especially for BEFORE and
INSTEAD OF triggers?


BEFORE triggers on the chosen partition might query the parent table,
not just the partition, so I think we need to do this so that such
triggers can see all the rows that have been inserted into the parent
table until then.

if you'll excuse me, I will add one more argument.
It wasn't clear, so I've made an experiment: result of a SELECT in an 
INSERT trigger function shows only data, existed in the parent table 
before the start of COPY.
So, we haven't tools to access newly inserting rows in neighboring 
partition and don't need to flush tuple buffers immediately.

Where am I wrong?

--
Regards
Andrey Lepikhov
Postgres Professional




Re: Fast COPY FROM based on batch insert

2022-07-26 Thread Etsuro Fujita
On Fri, Jul 22, 2022 at 5:42 PM Andrey Lepikhov
 wrote:
> So, maybe switch
> status of this patch to 'Ready for committer'?

Yeah, I think the patch is getting better, but I noticed some issues,
so I'm working on them.  I think I can post a new version in the next
few days.

Best regards,
Etsuro Fujita




Re: Fast COPY FROM based on batch insert

2022-07-22 Thread Andrey Lepikhov

On 7/22/22 13:14, Etsuro Fujita wrote:

On Fri, Jul 22, 2022 at 3:39 PM Andrey Lepikhov

Why such cascade flush is really necessary, especially for BEFORE and
INSTEAD OF triggers?


BEFORE triggers on the chosen partition might query the parent table,
not just the partition, so I think we need to do this so that such
triggers can see all the rows that have been inserted into the parent
table until then.
Thanks for the explanation of your point of view. So, maybe switch 
status of this patch to 'Ready for committer'?


--
Regards
Andrey Lepikhov
Postgres Professional




Re: Fast COPY FROM based on batch insert

2022-07-22 Thread Etsuro Fujita
On Fri, Jul 22, 2022 at 3:39 PM Andrey Lepikhov
 wrote:
> Analyzing multi-level heterogeneous partitioned configurations I
> realized, that single write into a partition with a trigger will flush
> buffers for all other partitions of the parent table even if the parent
> haven't any triggers.
> It relates to the code:
> else if (insertMethod == CIM_MULTI_CONDITIONAL &&
>!CopyMultiInsertInfoIsEmpty())
> {
>/*
> * Flush pending inserts if this partition can't use
> * batching, so rows are visible to triggers etc.
> */
>CopyMultiInsertInfoFlush(, resultRelInfo);
> }
>
> Why such cascade flush is really necessary, especially for BEFORE and
> INSTEAD OF triggers?

BEFORE triggers on the chosen partition might query the parent table,
not just the partition, so I think we need to do this so that such
triggers can see all the rows that have been inserted into the parent
table until then.

Best regards,
Etsuro Fujita




Re: Fast COPY FROM based on batch insert

2022-07-22 Thread Andrey Lepikhov

On 7/20/22 13:10, Etsuro Fujita wrote:

On Tue, Jul 19, 2022 at 6:35 PM Andrey Lepikhov
 wrote:

On 18/7/2022 13:22, Etsuro Fujita wrote:

I rewrote the decision logic to something much simpler and much less
invasive, which reduces the patch size significantly.  Attached is an
updated patch.  What do you think about that?



maybe you forgot this code:
/*
   * If a partition's root parent isn't allowed to use it, neither is the
   * partition.
*/
if (rootResultRelInfo->ri_usesMultiInsert)
 leaf_part_rri->ri_usesMultiInsert =
 ExecMultiInsertAllowed(leaf_part_rri);


I think the patch accounts for that.  Consider this bit to determine
whether to use batching for the partition chosen by
ExecFindPartition():

Agreed.

Analyzing multi-level heterogeneous partitioned configurations I 
realized, that single write into a partition with a trigger will flush 
buffers for all other partitions of the parent table even if the parent 
haven't any triggers.

It relates to the code:
else if (insertMethod == CIM_MULTI_CONDITIONAL &&
  !CopyMultiInsertInfoIsEmpty())
{
  /*
   * Flush pending inserts if this partition can't use
   * batching, so rows are visible to triggers etc.
   */
  CopyMultiInsertInfoFlush(, resultRelInfo);
}

Why such cascade flush is really necessary, especially for BEFORE and 
INSTEAD OF triggers? AFTER Trigger should see all rows of the table, but 
if it isn't exists for parent, I think, we wouldn't obligate to 
guarantee order of COPY into two different tables.


--
Regards
Andrey Lepikhov
Postgres Professional




Re: Fast COPY FROM based on batch insert

2022-07-20 Thread Etsuro Fujita
On Tue, Jul 19, 2022 at 6:35 PM Andrey Lepikhov
 wrote:
> On 18/7/2022 13:22, Etsuro Fujita wrote:
> > I rewrote the decision logic to something much simpler and much less
> > invasive, which reduces the patch size significantly.  Attached is an
> > updated patch.  What do you think about that?

> maybe you forgot this code:
> /*
>   * If a partition's root parent isn't allowed to use it, neither is the
>   * partition.
> */
> if (rootResultRelInfo->ri_usesMultiInsert)
> leaf_part_rri->ri_usesMultiInsert =
> ExecMultiInsertAllowed(leaf_part_rri);

I think the patch accounts for that.  Consider this bit to determine
whether to use batching for the partition chosen by
ExecFindPartition():

@@ -910,12 +962,14 @@ CopyFrom(CopyFromState cstate)

/*
 * Disable multi-inserts when the partition has BEFORE/INSTEAD
-* OF triggers, or if the partition is a foreign partition.
+* OF triggers, or if the partition is a foreign partition
+* that can't use batching.
 */
leafpart_use_multi_insert = insertMethod == CIM_MULTI_CONDITION\
AL &&
!has_before_insert_row_trig &&
!has_instead_insert_row_trig &&
-   resultRelInfo->ri_FdwRoutine == NULL;
+   (resultRelInfo->ri_FdwRoutine == NULL ||
+resultRelInfo->ri_BatchSize > 1);

If the root parent isn't allowed to use batching, then we have
insertMethod=CIM_SINGLE for the parent before we get here.  So in that
case we have leafpart_use_multi_insert=false for the chosen partition,
meaning that the partition isn't allowed to use batching, either.
(The patch just extends the existing decision logic to the
foreign-partition case.)

> Also, maybe to describe in documentation, if the value of batch_size is
> more than 1, the ExecForeignBatchInsert routine have a chance to be called?

Yeah, but I think that is the existing behavior, and that the patch
doesn't change the behavior, so I would leave that for another patch.

Thanks for reviewing!

Best regards,
Etsuro Fujita




Re: Fast COPY FROM based on batch insert

2022-07-19 Thread Andrey Lepikhov

On 18/7/2022 13:22, Etsuro Fujita wrote:

On Thu, Mar 24, 2022 at 3:43 PM Andrey V. Lepikhov
 wrote:

On 3/22/22 06:54, Etsuro Fujita wrote:

* To allow foreign multi insert, the patch made an invasive change to
the existing logic to determine whether to use multi insert for the
target relation, adding a new member ri_usesMultiInsert to the
ResultRelInfo struct, as well as introducing a new function
ExecMultiInsertAllowed().  But I’m not sure we really need such a
change.  Isn’t it reasonable to *adjust* the existing logic to allow
foreign multi insert when possible?

Of course, such approach would look much better, if we implemented it.



I'll ponder how to do it.


I rewrote the decision logic to something much simpler and much less
invasive, which reduces the patch size significantly.  Attached is an
updated patch.  What do you think about that?

While working on the patch, I fixed a few issues as well:

+   if (resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize != NULL)
+   resultRelInfo->ri_BatchSize =
+
resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);

When determining the batch size, I think we should check if the
ExecForeignBatchInsert callback routine is also defined, like other
places such as execPartition.c.  For consistency I fixed this by
copying-and-pasting the code from that file.

+* Also, the COPY command requires a non-zero input list of attributes.
+* Therefore, the length of the attribute list is checked here.
+*/
+   if (!cstate->volatile_defexprs &&
+   list_length(cstate->attnumlist) > 0 &&
+   !contain_volatile_functions(cstate->whereClause))
+   target_resultRelInfo->ri_usesMultiInsert =
+   ExecMultiInsertAllowed(target_resultRelInfo);

I think “list_length(cstate->attnumlist) > 0” in the if-test would
break COPY FROM; it currently supports multi-inserting into *plain*
tables even in the case where they have no columns, but this would
disable the multi-insertion support in that case.  postgres_fdw would
not be able to batch into zero-column foreign tables due to the INSERT
syntax limitation (i.e., the syntax does not allow inserting multiple
empty rows into a zero-column table in a single INSERT statement).
Which is the reason why this was added to the if-test?  But I think
some other FDWs might be able to, so I think we should let the FDW
decide whether to allow batching even in that case, when called from
GetForeignModifyBatchSize.  So I removed the attnumlist test from the
patch, and modified postgresGetForeignModifyBatchSize as such.  I
might miss something, though.

Thanks a lot,
maybe you forgot this code:
/*
 * If a partition's root parent isn't allowed to use it, neither is the
 * partition.
*/
if (rootResultRelInfo->ri_usesMultiInsert)
leaf_part_rri->ri_usesMultiInsert =
ExecMultiInsertAllowed(leaf_part_rri);

Also, maybe to describe in documentation, if the value of batch_size is 
more than 1, the ExecForeignBatchInsert routine have a chance to be called?


--
regards,
Andrey Lepikhov
Postgres Professional




Re: Fast COPY FROM based on batch insert

2022-07-18 Thread Etsuro Fujita
On Thu, Mar 24, 2022 at 3:43 PM Andrey V. Lepikhov
 wrote:
> On 3/22/22 06:54, Etsuro Fujita wrote:
> > * To allow foreign multi insert, the patch made an invasive change to
> > the existing logic to determine whether to use multi insert for the
> > target relation, adding a new member ri_usesMultiInsert to the
> > ResultRelInfo struct, as well as introducing a new function
> > ExecMultiInsertAllowed().  But I’m not sure we really need such a
> > change.  Isn’t it reasonable to *adjust* the existing logic to allow
> > foreign multi insert when possible?
> Of course, such approach would look much better, if we implemented it.

> I'll ponder how to do it.

I rewrote the decision logic to something much simpler and much less
invasive, which reduces the patch size significantly.  Attached is an
updated patch.  What do you think about that?

While working on the patch, I fixed a few issues as well:

+   if (resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize != NULL)
+   resultRelInfo->ri_BatchSize =
+
resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);

When determining the batch size, I think we should check if the
ExecForeignBatchInsert callback routine is also defined, like other
places such as execPartition.c.  For consistency I fixed this by
copying-and-pasting the code from that file.

+* Also, the COPY command requires a non-zero input list of attributes.
+* Therefore, the length of the attribute list is checked here.
+*/
+   if (!cstate->volatile_defexprs &&
+   list_length(cstate->attnumlist) > 0 &&
+   !contain_volatile_functions(cstate->whereClause))
+   target_resultRelInfo->ri_usesMultiInsert =
+   ExecMultiInsertAllowed(target_resultRelInfo);

I think “list_length(cstate->attnumlist) > 0” in the if-test would
break COPY FROM; it currently supports multi-inserting into *plain*
tables even in the case where they have no columns, but this would
disable the multi-insertion support in that case.  postgres_fdw would
not be able to batch into zero-column foreign tables due to the INSERT
syntax limitation (i.e., the syntax does not allow inserting multiple
empty rows into a zero-column table in a single INSERT statement).
Which is the reason why this was added to the if-test?  But I think
some other FDWs might be able to, so I think we should let the FDW
decide whether to allow batching even in that case, when called from
GetForeignModifyBatchSize.  So I removed the attnumlist test from the
patch, and modified postgresGetForeignModifyBatchSize as such.  I
might miss something, though.

Best regards,
Etsuro Fujita


v4-0001-Implementation-of-a-Bulk-COPY-FROM-efujita-1.patch
Description: Binary data


Re: Fast COPY FROM based on batch insert

2022-07-10 Thread Andrey Lepikhov

On 11/7/2022 04:12, Ian Barwick wrote:

On 09/07/2022 00:09, Andrey Lepikhov wrote:

On 8/7/2022 05:12, Ian Barwick wrote:
 ERROR:  bind message supplies 0 parameters, but prepared 
statement "pgsql_fdw_prep_178" requires 6
 CONTEXT:  remote SQL command: INSERT INTO public.foo_part_1(t, 
v1, v2, v3, v4, v5) VALUES ($1, $2, $3, $4, $5, $6)

 COPY foo, line 88160
Thanks, I got it. MultiInsertBuffer are created on the first non-zero 
flush of tuples into the partition and isn't deleted from the buffers 
list until the end of COPY. And on a subsequent flush in the case of 
empty buffer we catch the error.
Your fix is correct, but I want to propose slightly different change 
(see in attachment).


LGTM.

New version (with aforementioned changes) is attached.

--
regards,
Andrey Lepikhov
Postgres ProfessionalFrom 976560f2ad406adba1aaf58a188b44302855ee12 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Fri, 4 Jun 2021 13:21:43 +0500
Subject: [PATCH] Implementation of a Bulk COPY FROM operation into foreign
 table.

---
 .../postgres_fdw/expected/postgres_fdw.out|  45 +++-
 contrib/postgres_fdw/sql/postgres_fdw.sql |  47 
 src/backend/commands/copyfrom.c   | 211 --
 src/backend/executor/execMain.c   |  45 
 src/backend/executor/execPartition.c  |   8 +
 src/include/commands/copyfrom_internal.h  |  10 -
 src/include/executor/executor.h   |   1 +
 src/include/nodes/execnodes.h |   5 +-
 8 files changed, 238 insertions(+), 134 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out 
b/contrib/postgres_fdw/expected/postgres_fdw.out
index 44457f930c..aced9a6428 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8435,6 +8435,7 @@ drop table loct2;
 -- ===
 -- test COPY FROM
 -- ===
+alter server loopback options (add batch_size '2');
 create table loc2 (f1 int, f2 text);
 alter table loc2 set (autovacuum_enabled = 'false');
 create foreign table rem2 (f1 int, f2 text) server loopback options(table_name 
'loc2');
@@ -8457,7 +8458,7 @@ copy rem2 from stdin; -- ERROR
 ERROR:  new row for relation "loc2" violates check constraint "loc2_f1positive"
 DETAIL:  Failing row contains (-1, xyzzy).
 CONTEXT:  remote SQL command: INSERT INTO public.loc2(f1, f2) VALUES ($1, $2)
-COPY rem2, line 1: "-1 xyzzy"
+COPY rem2, line 2
 select * from rem2;
  f1 | f2  
 +-
@@ -8468,6 +8469,19 @@ select * from rem2;
 alter foreign table rem2 drop constraint rem2_f1positive;
 alter table loc2 drop constraint loc2_f1positive;
 delete from rem2;
+create table foo (a int) partition by list (a);
+create table foo1 (like foo);
+create foreign table ffoo1 partition of foo for values in (1)
+   server loopback options (table_name 'foo1');
+create table foo2 (like foo);
+create foreign table ffoo2 partition of foo for values in (2)
+   server loopback options (table_name 'foo2');
+create function print_new_row() returns trigger language plpgsql as $$
+   begin raise notice '%', new; return new; end; $$;
+create trigger ffoo1_br_trig before insert on ffoo1
+   for each row execute function print_new_row();
+copy foo from stdin;
+NOTICE:  (1)
 -- Test local triggers
 create trigger trig_stmt_before before insert on rem2
for each statement execute procedure trigger_func();
@@ -8576,6 +8590,34 @@ drop trigger rem2_trig_row_before on rem2;
 drop trigger rem2_trig_row_after on rem2;
 drop trigger loc2_trig_row_before_insert on loc2;
 delete from rem2;
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+ERROR:  column "f1" of relation "loc2" does not exist
+CONTEXT:  remote SQL command: INSERT INTO public.loc2(f1, f2) VALUES ($1, $2), 
($3, $4)
+COPY rem2, line 3
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+ f1 | f2 
++
+(0 rows)
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(2 rows)
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(4 rows)
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
@@ -8592,6 +8634,7 @@ select * from rem3;
 
 drop foreign table rem3;
 drop table loc3;
+alter server loopback options (drop batch_size);
 -- ===
 -- test for TRUNCATE
 -- ===
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql 
b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 92d1212027..5c047ce8ee 100644
--- 

Re: Fast COPY FROM based on batch insert

2022-07-10 Thread Ian Barwick

On 09/07/2022 00:09, Andrey Lepikhov wrote:

On 8/7/2022 05:12, Ian Barwick wrote:

 ERROR:  bind message supplies 0 parameters, but prepared statement 
"pgsql_fdw_prep_178" requires 6
 CONTEXT:  remote SQL command: INSERT INTO public.foo_part_1(t, v1, v2, v3, 
v4, v5) VALUES ($1, $2, $3, $4, $5, $6)
 COPY foo, line 88160

Thanks, I got it. MultiInsertBuffer are created on the first non-zero flush of 
tuples into the partition and isn't deleted from the buffers list until the end 
of COPY. And on a subsequent flush in the case of empty buffer we catch the 
error.
Your fix is correct, but I want to propose slightly different change (see in 
attachment).


LGTM.

Regards

Ian Barwick

--
https://www.enterprisedb.com/




Re: Fast COPY FROM based on batch insert

2022-07-08 Thread Andrey Lepikhov

On 8/7/2022 05:12, Ian Barwick wrote:
     ERROR:  bind message supplies 0 parameters, but prepared statement 
"pgsql_fdw_prep_178" requires 6
     CONTEXT:  remote SQL command: INSERT INTO public.foo_part_1(t, v1, 
v2, v3, v4, v5) VALUES ($1, $2, $3, $4, $5, $6)

     COPY foo, line 88160
Thanks, I got it. MultiInsertBuffer are created on the first non-zero 
flush of tuples into the partition and isn't deleted from the buffers 
list until the end of COPY. And on a subsequent flush in the case of 
empty buffer we catch the error.
Your fix is correct, but I want to propose slightly different change 
(see in attachment).


--
regards,
Andrey Lepikhov
Postgres Professionaldiff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index 245a260982..203289f7f2 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -329,7 +329,8 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
   
resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize != NULL);
 
/* Flush into foreign table or partition */
-   do {
+   while(sent < nused)
+   {
int size = (resultRelInfo->ri_BatchSize < nused - sent) 
?
resultRelInfo->ri_BatchSize : 
(nused - sent);
int inserted = size;
@@ -340,7 +341,7 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,

 NULL,

 );
sent += size;
-   } while (sent < nused);
+   }
}
else
{

Re: Fast COPY FROM based on batch insert

2022-07-07 Thread Ian Barwick

On 07/07/2022 22:51, Andrey Lepikhov wrote:
> On 7/7/2022 06:14, Ian Barwick wrote:
>> 2022年3月24日(木) 15:44 Andrey V. Lepikhov :
>>  >
>>  > On 3/22/22 06:54, Etsuro Fujita wrote:
>>  > > On Fri, Jun 4, 2021 at 5:26 PM Andrey Lepikhov
>>  > >  wrote:
>>  > >> We still have slow 'COPY FROM' operation for foreign tables in current
>>  > >> master.
>>  > >> Now we have a foreign batch insert operation And I tried to rewrite the
>>  > >> patch [1] with this machinery.
>>  > >
>>  > > The patch has been rewritten to something essentially different, but
>>  > > no one reviewed it.  (Tsunakawa-san gave some comments without looking
>>  > > at it, though.)  So the right status of the patch is “Needs review”,
>>  > > rather than “Ready for Committer”?  Anyway, here are a few review
>>  > > comments from me:
>>  > >
>>  > > * I don’t think this assumption is correct:
>>  > >
>>  > > @@ -359,6 +386,12 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo 
*miinfo,
>>  > > (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
>>  > >resultRelInfo->ri_TrigDesc->trig_insert_new_table))
>>  > >  {
>>  > > +   /*
>>  > > +* AFTER ROW triggers aren't allowed with the foreign bulk 
insert
>>  > > +* method.
>>  > > +*/
>>  > > +   Assert(resultRelInfo->ri_RelationDesc->rd_rel->relkind !=
>>  > > RELKIND_FOREIGN_TABLE);
>>  > > +
>>  > >
>>  > > In postgres_fdw we disable foreign batch insert when the target table
>>  > > has AFTER ROW triggers, but the core allows it even in that case.  No?
>>  > Agree
>>  >
>>  > > * To allow foreign multi insert, the patch made an invasive change to
>>  > > the existing logic to determine whether to use multi insert for the
>>  > > target relation, adding a new member ri_usesMultiInsert to the
>>  > > ResultRelInfo struct, as well as introducing a new function
>>  > > ExecMultiInsertAllowed().  But I’m not sure we really need such a
>>  > > change.  Isn’t it reasonable to *adjust* the existing logic to allow
>>  > > foreign multi insert when possible?
>>  > Of course, such approach would look much better, if we implemented it.
>>  > I'll ponder how to do it.
>>  >
>>  > > I didn’t finish my review, but I’ll mark this as “Waiting on Author”.
>>  > I rebased the patch onto current master. Now it works correctly. I'll
>>  > mark it as "Waiting for review".
>>
>> I took a look at this patch as it would a useful optimization to have.
>>
>> It applies cleanly to current HEAD, but as-is, with a large data set, it
>> reproducibly fails like this (using postgres_fdw):
>>
>>  postgres=# COPY foo FROM '/tmp/fast-copy-from/test.csv' WITH (format 
csv);
>>  ERROR:  bind message supplies 0 parameters, but prepared statement 
"pgsql_fdw_prep_19422" requires 6
>>  CONTEXT:  remote SQL command: INSERT INTO public.foo_part_1(t, v1, v2, 
v3, v4, v5) VALUES ($1, $2, $3, $4, $5, $6)
>>  COPY foo, line 17281589
>>
>> This occurs because not all multi-insert buffers being flushed actually 
contain
>> tuples; the fix is simply not to call ExecForeignBatchInsert() if that's the 
case,
>> e.g:
>>
>>
>>  /* Flush into foreign table or partition */
>>  do {
>>  int size = (resultRelInfo->ri_BatchSize < nused - sent) ?
>>  resultRelInfo->ri_BatchSize : (nused - sent);
>>
>>  if (size)
>>  {
>>  int inserted = size;
>>
>> resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert(estate,
>> resultRelInfo,
>> [sent],
>>   NULL,
>> );
>>  sent += size;
>>  }
>>  } while (sent < nused);
>>
>>
>> There might a case for arguing that the respective FDW should check that it 
has
>> actually received some tuples to insert, but IMHO it's much preferable to 
catch
>> this as early as possible and avoid a superfluous call.
>>
>> FWIW, with the above fix in place, with a simple local test the patch 
produces a
>> consistent speed-up of about 8 times compared to the existing functionality.
>
> Thank you for the attention to the patch.
> I have a couple of questions:
>
> 1. It's a problem for me to reproduce the case you reported. Can you give more
>details on the reproduction?

The issue seems to occur when the data spans more than one foreign partition,
probably because the accumulated data for one partition needs to be flushed
before moving on to the next partition, but not all pre-allocated multi-insert
buffers have been filled.

The reproduction method I have, which is pared down from the original bulk 
insert
which triggered the error, is as follows:

1. Create some data using the attached script:

perl data.pl > /tmp/data.csv

2. Create two nodes (A and B)

3. On node B, create tables as follows:

CREATE TABLE foo_part_1 (t timestamptz, v1 int, v2 int, v3 int, v4 text, v5 
text);
CREATE TABLE foo_part_2 (t timestamptz, v1 int, v2 

Re: Fast COPY FROM based on batch insert

2022-07-07 Thread Andrey Lepikhov

On 7/7/2022 06:14, Ian Barwick wrote:

2022年3月24日(木) 15:44 Andrey V. Lepikhov :
 >
 > On 3/22/22 06:54, Etsuro Fujita wrote:
 > > On Fri, Jun 4, 2021 at 5:26 PM Andrey Lepikhov
 > >  wrote:
 > >> We still have slow 'COPY FROM' operation for foreign tables in 
current

 > >> master.
 > >> Now we have a foreign batch insert operation And I tried to 
rewrite the

 > >> patch [1] with this machinery.
 > >
 > > The patch has been rewritten to something essentially different, but
 > > no one reviewed it.  (Tsunakawa-san gave some comments without looking
 > > at it, though.)  So the right status of the patch is “Needs review”,
 > > rather than “Ready for Committer”?  Anyway, here are a few review
 > > comments from me:
 > >
 > > * I don’t think this assumption is correct:
 > >
 > > @@ -359,6 +386,12 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo 
*miinfo,
 > >   
(resultRelInfo->ri_TrigDesc->trig_insert_after_row ||

 > >    resultRelInfo->ri_TrigDesc->trig_insert_new_table))
 > >  {
 > > +   /*
 > > +    * AFTER ROW triggers aren't allowed with the foreign 
bulk insert

 > > +    * method.
 > > +    */
 > > +   Assert(resultRelInfo->ri_RelationDesc->rd_rel->relkind !=
 > > RELKIND_FOREIGN_TABLE);
 > > +
 > >
 > > In postgres_fdw we disable foreign batch insert when the target table
 > > has AFTER ROW triggers, but the core allows it even in that case.  No?
 > Agree
 >
 > > * To allow foreign multi insert, the patch made an invasive change to
 > > the existing logic to determine whether to use multi insert for the
 > > target relation, adding a new member ri_usesMultiInsert to the
 > > ResultRelInfo struct, as well as introducing a new function
 > > ExecMultiInsertAllowed().  But I’m not sure we really need such a
 > > change.  Isn’t it reasonable to *adjust* the existing logic to allow
 > > foreign multi insert when possible?
 > Of course, such approach would look much better, if we implemented it.
 > I'll ponder how to do it.
 >
 > > I didn’t finish my review, but I’ll mark this as “Waiting on Author”.
 > I rebased the patch onto current master. Now it works correctly. I'll
 > mark it as "Waiting for review".

I took a look at this patch as it would a useful optimization to have.

It applies cleanly to current HEAD, but as-is, with a large data set, it
reproducibly fails like this (using postgres_fdw):

     postgres=# COPY foo FROM '/tmp/fast-copy-from/test.csv' WITH 
(format csv);
     ERROR:  bind message supplies 0 parameters, but prepared statement 
"pgsql_fdw_prep_19422" requires 6
     CONTEXT:  remote SQL command: INSERT INTO public.foo_part_1(t, v1, 
v2, v3, v4, v5) VALUES ($1, $2, $3, $4, $5, $6)

     COPY foo, line 17281589

This occurs because not all multi-insert buffers being flushed actually 
contain
tuples; the fix is simply not to call ExecForeignBatchInsert() if that's 
the case,

e.g:


     /* Flush into foreign table or partition */
     do {
     int size = (resultRelInfo->ri_BatchSize < nused - sent) ?
     resultRelInfo->ri_BatchSize : (nused - sent);

     if (size)
     {
     int inserted = size;

 
resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert(estate,
  
resultRelInfo,
  
[sent],

  NULL,
  
);

     sent += size;
     }
     } while (sent < nused);


There might a case for arguing that the respective FDW should check that 
it has
actually received some tuples to insert, but IMHO it's much preferable 
to catch

this as early as possible and avoid a superfluous call.

FWIW, with the above fix in place, with a simple local test the patch 
produces a
consistent speed-up of about 8 times compared to the existing 
functionality.

Thank you for the attention to the patch.
I have a couple of questions:
1. It's a problem for me to reproduce the case you reported. Can you 
give more details on the reproduction?
2. Have you tried to use previous version, based on bulk COPY machinery, 
not bulk INSERT? Which approach looks better and have better performance 
in your opinion?


--
regards,
Andrey Lepikhov
Postgres Professional




Re: Fast COPY FROM based on batch insert

2022-07-06 Thread Ian Barwick

2022年3月24日(木) 15:44 Andrey V. Lepikhov :
>
> On 3/22/22 06:54, Etsuro Fujita wrote:
> > On Fri, Jun 4, 2021 at 5:26 PM Andrey Lepikhov
> >  wrote:
> >> We still have slow 'COPY FROM' operation for foreign tables in current
> >> master.
> >> Now we have a foreign batch insert operation And I tried to rewrite the
> >> patch [1] with this machinery.
> >
> > The patch has been rewritten to something essentially different, but
> > no one reviewed it.  (Tsunakawa-san gave some comments without looking
> > at it, though.)  So the right status of the patch is “Needs review”,
> > rather than “Ready for Committer”?  Anyway, here are a few review
> > comments from me:
> >
> > * I don’t think this assumption is correct:
> >
> > @@ -359,6 +386,12 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
> >   (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
> >resultRelInfo->ri_TrigDesc->trig_insert_new_table))
> >  {
> > +   /*
> > +* AFTER ROW triggers aren't allowed with the foreign bulk 
insert
> > +* method.
> > +*/
> > +   Assert(resultRelInfo->ri_RelationDesc->rd_rel->relkind !=
> > RELKIND_FOREIGN_TABLE);
> > +
> >
> > In postgres_fdw we disable foreign batch insert when the target table
> > has AFTER ROW triggers, but the core allows it even in that case.  No?
> Agree
>
> > * To allow foreign multi insert, the patch made an invasive change to
> > the existing logic to determine whether to use multi insert for the
> > target relation, adding a new member ri_usesMultiInsert to the
> > ResultRelInfo struct, as well as introducing a new function
> > ExecMultiInsertAllowed().  But I’m not sure we really need such a
> > change.  Isn’t it reasonable to *adjust* the existing logic to allow
> > foreign multi insert when possible?
> Of course, such approach would look much better, if we implemented it.
> I'll ponder how to do it.
>
> > I didn’t finish my review, but I’ll mark this as “Waiting on Author”.
> I rebased the patch onto current master. Now it works correctly. I'll
> mark it as "Waiting for review".

I took a look at this patch as it would a useful optimization to have.

It applies cleanly to current HEAD, but as-is, with a large data set, it
reproducibly fails like this (using postgres_fdw):

postgres=# COPY foo FROM '/tmp/fast-copy-from/test.csv' WITH (format csv);
ERROR:  bind message supplies 0 parameters, but prepared statement 
"pgsql_fdw_prep_19422" requires 6
CONTEXT:  remote SQL command: INSERT INTO public.foo_part_1(t, v1, v2, v3, 
v4, v5) VALUES ($1, $2, $3, $4, $5, $6)
COPY foo, line 17281589

This occurs because not all multi-insert buffers being flushed actually contain
tuples; the fix is simply not to call ExecForeignBatchInsert() if that's the 
case,
e.g:


/* Flush into foreign table or partition */
do {
int size = (resultRelInfo->ri_BatchSize < nused - sent) ?
resultRelInfo->ri_BatchSize : (nused - sent);

if (size)
{
int inserted = size;

resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert(estate,
 
resultRelInfo,
 
[sent],
 NULL,
 );
sent += size;
}
} while (sent < nused);


There might a case for arguing that the respective FDW should check that it has
actually received some tuples to insert, but IMHO it's much preferable to catch
this as early as possible and avoid a superfluous call.

FWIW, with the above fix in place, with a simple local test the patch produces a
consistent speed-up of about 8 times compared to the existing functionality.


Regards

Ian Barwick

--

EnterpriseDB - https://www.enterprisedb.com




Re: Fast COPY FROM based on batch insert

2022-03-24 Thread Andrey V. Lepikhov

On 3/22/22 06:54, Etsuro Fujita wrote:

On Fri, Jun 4, 2021 at 5:26 PM Andrey Lepikhov
 wrote:

We still have slow 'COPY FROM' operation for foreign tables in current
master.
Now we have a foreign batch insert operation And I tried to rewrite the
patch [1] with this machinery.


The patch has been rewritten to something essentially different, but
no one reviewed it.  (Tsunakawa-san gave some comments without looking
at it, though.)  So the right status of the patch is “Needs review”,
rather than “Ready for Committer”?  Anyway, here are a few review
comments from me:

* I don’t think this assumption is correct:

@@ -359,6 +386,12 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
  (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
   resultRelInfo->ri_TrigDesc->trig_insert_new_table))
 {
+   /*
+* AFTER ROW triggers aren't allowed with the foreign bulk insert
+* method.
+*/
+   Assert(resultRelInfo->ri_RelationDesc->rd_rel->relkind !=
RELKIND_FOREIGN_TABLE);
+

In postgres_fdw we disable foreign batch insert when the target table
has AFTER ROW triggers, but the core allows it even in that case.  No?

Agree


* To allow foreign multi insert, the patch made an invasive change to
the existing logic to determine whether to use multi insert for the
target relation, adding a new member ri_usesMultiInsert to the
ResultRelInfo struct, as well as introducing a new function
ExecMultiInsertAllowed().  But I’m not sure we really need such a
change.  Isn’t it reasonable to *adjust* the existing logic to allow
foreign multi insert when possible?
Of course, such approach would look much better, if we implemented it. 
I'll ponder how to do it.



I didn’t finish my review, but I’ll mark this as “Waiting on Author”.
I rebased the patch onto current master. Now it works correctly. I'll 
mark it as "Waiting for review".


--
regards,
Andrey Lepikhov
Postgres ProfessionalFrom 2d51d0f5d94a3e4b3400714b5841228d1896fb56 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Fri, 4 Jun 2021 13:21:43 +0500
Subject: [PATCH] Implementation of a Bulk COPY FROM operation into foreign
 table.

---
 .../postgres_fdw/expected/postgres_fdw.out|  45 +++-
 contrib/postgres_fdw/sql/postgres_fdw.sql |  47 
 src/backend/commands/copyfrom.c   | 210 --
 src/backend/executor/execMain.c   |  45 
 src/backend/executor/execPartition.c  |   8 +
 src/include/commands/copyfrom_internal.h  |  10 -
 src/include/executor/executor.h   |   1 +
 src/include/nodes/execnodes.h |   5 +-
 8 files changed, 237 insertions(+), 134 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index f210f91188..a803029f2f 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8415,6 +8415,7 @@ drop table loct2;
 -- ===
 -- test COPY FROM
 -- ===
+alter server loopback options (add batch_size '2');
 create table loc2 (f1 int, f2 text);
 alter table loc2 set (autovacuum_enabled = 'false');
 create foreign table rem2 (f1 int, f2 text) server loopback options(table_name 'loc2');
@@ -8437,7 +8438,7 @@ copy rem2 from stdin; -- ERROR
 ERROR:  new row for relation "loc2" violates check constraint "loc2_f1positive"
 DETAIL:  Failing row contains (-1, xyzzy).
 CONTEXT:  remote SQL command: INSERT INTO public.loc2(f1, f2) VALUES ($1, $2)
-COPY rem2, line 1: "-1	xyzzy"
+COPY rem2, line 2
 select * from rem2;
  f1 | f2  
 +-
@@ -8448,6 +8449,19 @@ select * from rem2;
 alter foreign table rem2 drop constraint rem2_f1positive;
 alter table loc2 drop constraint loc2_f1positive;
 delete from rem2;
+create table foo (a int) partition by list (a);
+create table foo1 (like foo);
+create foreign table ffoo1 partition of foo for values in (1)
+	server loopback options (table_name 'foo1');
+create table foo2 (like foo);
+create foreign table ffoo2 partition of foo for values in (2)
+	server loopback options (table_name 'foo2');
+create function print_new_row() returns trigger language plpgsql as $$
+	begin raise notice '%', new; return new; end; $$;
+create trigger ffoo1_br_trig before insert on ffoo1
+	for each row execute function print_new_row();
+copy foo from stdin;
+NOTICE:  (1)
 -- Test local triggers
 create trigger trig_stmt_before before insert on rem2
 	for each statement execute procedure trigger_func();
@@ -8556,6 +8570,34 @@ drop trigger rem2_trig_row_before on rem2;
 drop trigger rem2_trig_row_after on rem2;
 drop trigger loc2_trig_row_before_insert on loc2;
 delete from rem2;
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+ERROR:  column "f1" of relation "loc2" does not exist

Re: Fast COPY FROM based on batch insert

2022-03-21 Thread Etsuro Fujita
On Tue, Mar 22, 2022 at 8:58 AM Andres Freund  wrote:
>
> On 2021-06-07 16:16:58 +0500, Andrey Lepikhov wrote:
> > Second version of the patch fixes problems detected by the FDW regression
> > tests and shows differences of error reports in tuple-by-tuple and batched
> > COPY approaches.
>
> Patch doesn't apply and likely hasn't for a while...

Actually, it has bit-rotted due to the recent fix for cross-partition
updates (i.e., commit ba9a7e392).

Thanks!

Best regards,
Etsuro Fujita




Re: Fast COPY FROM based on batch insert

2022-03-21 Thread Etsuro Fujita
On Fri, Jun 4, 2021 at 5:26 PM Andrey Lepikhov
 wrote:
> We still have slow 'COPY FROM' operation for foreign tables in current
> master.
> Now we have a foreign batch insert operation And I tried to rewrite the
> patch [1] with this machinery.

I’d been reviewing the previous version of the patch without noticing
this.  (Gmail grouped it in a new thread due to the subject change,
but I overlooked the whole thread.)

I agree with you that the first step for fast copy into foreign
tables/partitions is to use the foreign-batch-insert API.  (Actually,
I was also thinking the same while reviewing the previous version.)
Thanks for the new version of the patch!

The patch has been rewritten to something essentially different, but
no one reviewed it.  (Tsunakawa-san gave some comments without looking
at it, though.)  So the right status of the patch is “Needs review”,
rather than “Ready for Committer”?  Anyway, here are a few review
comments from me:

* I don’t think this assumption is correct:

@@ -359,6 +386,12 @@ CopyMultiInsertBufferFlush(CopyMultiInsertInfo *miinfo,
 (resultRelInfo->ri_TrigDesc->trig_insert_after_row ||
  resultRelInfo->ri_TrigDesc->trig_insert_new_table))
{
+   /*
+* AFTER ROW triggers aren't allowed with the foreign bulk insert
+* method.
+*/
+   Assert(resultRelInfo->ri_RelationDesc->rd_rel->relkind !=
RELKIND_FOREIGN_TABLE);
+

In postgres_fdw we disable foreign batch insert when the target table
has AFTER ROW triggers, but the core allows it even in that case.  No?

* To allow foreign multi insert, the patch made an invasive change to
the existing logic to determine whether to use multi insert for the
target relation, adding a new member ri_usesMultiInsert to the
ResultRelInfo struct, as well as introducing a new function
ExecMultiInsertAllowed().  But I’m not sure we really need such a
change.  Isn’t it reasonable to *adjust* the existing logic to allow
foreign multi insert when possible?

I didn’t finish my review, but I’ll mark this as “Waiting on Author”.

My apologies for the long long delay.

Best regards,
Etsuro Fujita




Re: Fast COPY FROM based on batch insert

2022-03-21 Thread Andres Freund
On 2021-06-07 16:16:58 +0500, Andrey Lepikhov wrote:
> Second version of the patch fixes problems detected by the FDW regression
> tests and shows differences of error reports in tuple-by-tuple and batched
> COPY approaches.

Patch doesn't apply and likely hasn't for a while...




Re: Fast COPY FROM based on batch insert

2021-06-07 Thread Andrey Lepikhov
Second version of the patch fixes problems detected by the FDW 
regression tests and shows differences of error reports in 
tuple-by-tuple and batched COPY approaches.


--
regards,
Andrey Lepikhov
Postgres Professional
From 68ad02038d7477e005b65bf5aeeac4efbb41073e Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Fri, 4 Jun 2021 13:21:43 +0500
Subject: [PATCH] Implementation of a Bulk COPY FROM operation into foreign
 table.

---
 .../postgres_fdw/expected/postgres_fdw.out|  45 +++-
 contrib/postgres_fdw/sql/postgres_fdw.sql |  47 
 src/backend/commands/copyfrom.c   | 216 --
 src/backend/executor/execMain.c   |  45 
 src/backend/executor/execPartition.c  |   8 +
 src/include/commands/copyfrom_internal.h  |  10 -
 src/include/executor/executor.h   |   1 +
 src/include/nodes/execnodes.h |   8 +-
 8 files changed, 246 insertions(+), 134 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out 
b/contrib/postgres_fdw/expected/postgres_fdw.out
index f320a7578d..146a3be576 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8143,6 +8143,7 @@ drop table loct2;
 -- ===
 -- test COPY FROM
 -- ===
+alter server loopback options (add batch_size '2');
 create table loc2 (f1 int, f2 text);
 alter table loc2 set (autovacuum_enabled = 'false');
 create foreign table rem2 (f1 int, f2 text) server loopback options(table_name 
'loc2');
@@ -8165,7 +8166,7 @@ copy rem2 from stdin; -- ERROR
 ERROR:  new row for relation "loc2" violates check constraint "loc2_f1positive"
 DETAIL:  Failing row contains (-1, xyzzy).
 CONTEXT:  remote SQL command: INSERT INTO public.loc2(f1, f2) VALUES ($1, $2)
-COPY rem2, line 1: "-1 xyzzy"
+COPY rem2, line 2
 select * from rem2;
  f1 | f2  
 +-
@@ -8176,6 +8177,19 @@ select * from rem2;
 alter foreign table rem2 drop constraint rem2_f1positive;
 alter table loc2 drop constraint loc2_f1positive;
 delete from rem2;
+create table foo (a int) partition by list (a);
+create table foo1 (like foo);
+create foreign table ffoo1 partition of foo for values in (1)
+   server loopback options (table_name 'foo1');
+create table foo2 (like foo);
+create foreign table ffoo2 partition of foo for values in (2)
+   server loopback options (table_name 'foo2');
+create function print_new_row() returns trigger language plpgsql as $$
+   begin raise notice '%', new; return new; end; $$;
+create trigger ffoo1_br_trig before insert on ffoo1
+   for each row execute function print_new_row();
+copy foo from stdin;
+NOTICE:  (1)
 -- Test local triggers
 create trigger trig_stmt_before before insert on rem2
for each statement execute procedure trigger_func();
@@ -8284,6 +8298,34 @@ drop trigger rem2_trig_row_before on rem2;
 drop trigger rem2_trig_row_after on rem2;
 drop trigger loc2_trig_row_before_insert on loc2;
 delete from rem2;
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+ERROR:  column "f1" of relation "loc2" does not exist
+CONTEXT:  remote SQL command: INSERT INTO public.loc2(f1, f2) VALUES ($1, $2), 
($3, $4)
+COPY rem2, line 3
+alter table loc2 add column f1 int;
+alter table loc2 add column f2 int;
+select * from rem2;
+ f1 | f2 
++
+(0 rows)
+
+-- dropped columns locally and on the foreign server
+alter table rem2 drop column f1;
+alter table rem2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(2 rows)
+
+alter table loc2 drop column f1;
+alter table loc2 drop column f2;
+copy rem2 from stdin;
+select * from rem2;
+--
+(4 rows)
+
 -- test COPY FROM with foreign table created in the same transaction
 create table loc3 (f1 int, f2 text);
 begin;
@@ -8300,6 +8342,7 @@ select * from rem3;
 
 drop foreign table rem3;
 drop table loc3;
+alter server loopback options (drop batch_size);
 -- ===
 -- test for TRUNCATE
 -- ===
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql 
b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 17dba77d7e..8371d16c6a 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -2226,6 +2226,7 @@ drop table loct2;
 -- test COPY FROM
 -- ===
 
+alter server loopback options (add batch_size '2');
 create table loc2 (f1 int, f2 text);
 alter table loc2 set (autovacuum_enabled = 'false');
 create foreign table rem2 (f1 int, f2 text) server loopback options(table_name 
'loc2');
@@ -2258,6 +2259,23 @@ alter table loc2 drop constraint loc2_f1positive;
 
 delete from rem2;
 
+create table foo (a int) partition by list (a);
+create table foo1 

Re: Fast COPY FROM based on batch insert

2021-06-07 Thread Andrey Lepikhov

On 4/6/21 13:45, tsunakawa.ta...@fujitsu.com wrote:

From: Andrey Lepikhov 

We still have slow 'COPY FROM' operation for foreign tables in current master.
Now we have a foreign batch insert operation And I tried to rewrite the patch 
[1]
with this machinery.


I haven't looked at the patch, but nice performance.

However, I see the following problems.  What do you think about them?

I agree with your fears.
Think about this patch as an intermediate step on the way to fast COPY 
FROM. This patch contains all logic of the previous patch, except of 
transport machinery (bulk insertion api).
It may be simpler to understand advantages of proposed 'COPY' FDW API 
having committed 'COPY FROM ...' feature based on the bulk insert FDW API.


--
regards,
Andrey Lepikhov
Postgres Professional




RE: Fast COPY FROM based on batch insert

2021-06-04 Thread tsunakawa.ta...@fujitsu.com
From: Andrey Lepikhov 
> We still have slow 'COPY FROM' operation for foreign tables in current master.
> Now we have a foreign batch insert operation And I tried to rewrite the patch 
> [1]
> with this machinery.

I haven't looked at the patch, but nice performance.

However, I see the following problems.  What do you think about them?

1)
No wonder why the user would think like "Why are INSERTs run on the remote 
server?  I ran COPY."


2)
Without the FDW API for COPY, other FDWs won't get a chance to optimize for 
bulk data loading.  For example, oracle_fdw might use conventional path insert 
for the FDW batch insert, and the direct path insert for the FDW COPY.


3)
INSERT and COPY in Postgres differs in whether the rule is invoked:

https://www.postgresql.org/docs/devel/sql-copy.html

"COPY FROM will invoke any triggers and check constraints on the destination 
table. However, it will not invoke rules."


Regards
Takayuki Tsunakawa