subject:"Re\: POC\: postgres_fdw insert batching"

Re: POC: postgres_fdw insert batching

2021-02-17 Thread Amit Langote

On Thu, Feb 18, 2021 at 4:36 AM Tomas Vondra
 wrote:
> On 2/17/21 9:51 AM, Amit Langote wrote:
> > On Wed, Feb 17, 2021 at 5:46 PM Amit Langote  
> > wrote:
> >> Sorry, I hadn't shared enough details of my investigations when I
> >> originally ran into this.  Such as that I had considered implementing
> >> the use of batching for these inserts too but had given up.
> >>
> >> Now that you mention it, I think I gave a less convincing reason for
> >> why we should avoid doing it at all.  Maybe it would have been more
> >> right to say that it is the core code, not necessarily the FDWs, that
> >> currently fails to deal with the use of batching by the insert
> >> component of a cross-partition update.  Those failures could be
> >> addressed as I'll describe below.
> >>
> >> For postgres_fdw, postgresGetForeignModifyBatchSize() could be taught
> >> to simply use the PgFdwModifyTable that is installed to handle the
> >> insert component of a cross-partition update (one can get that one via
> >> aux_fmstate field of the original PgFdwModifyState).  However, even
> >> though that's fine for postgres_fdw to do, what worries (had worried)
> >> me is that it also results in scribbling on ri_BatchSize that the core
> >> code may see to determine what to do with a particular tuple, and I
> >> just have to hope that nodeModifyTable.c doesn't end up doing anything
> >> unwarranted with the original update based on seeing a non-zero
> >> ri_BatchSize.  AFAICS, we are fine on that front.
> >>
> >> That said, there are some deficiencies in the code that have to be
> >> addressed before we can let postgres_fdw do as mentioned above.  For
> >> example, the code in ExecModifyTable() that runs after breaking out of
> >> the loop to insert any remaining batched tuples appears to miss the
> >> tuples batched by such inserts.   Apparently, that is because the
> >> ResultRelInfos used by those inserts are not present in
> >> es_tuple_routing_result_relations.  Turns out I had forgotten that
> >> execPartition.c doesn't add the ResultRelInfos to that list if they
> >> are made by ExecInitModifyTable() for the original update operation
> >> and simply reused by ExecFindPartition() when tuples were routed to
> >> those partitions.  It can be "fixed" by reverting to the original
> >> design in Tsunakawa-san's patch where the tuple routing result
> >> relations were obtained from the PartitionTupleRouting data structure,
> >> which fortunately stores all tuple routing result relations.  (Sorry,
> >> I gave wrong advice in [1] in retrospect.)
> >>
> >>> On a closer look, it seems the problem actually lies in a small
> >>> inconsistency between create_foreign_modify and ExecInitRoutingInfo. The
> >>> former only set batch_size for CMD_INSERT while the latter called the
> >>> BatchSize() for all operations, expecting >= 1 result. So we may either
> >>> relax create_foreign_modify and set batch_size for all DML, or make
> >>> ExecInitRoutingInfo stricter (which is what the patches here do).
> >>
> >> I think we should be fine if we make
> >> postgresGetForeignModifyBatchSize() use the correct PgFdwModifyState
> >> as described above.  We can be sure that we are not mixing the
> >> information used by the batched insert with that of the original
> >> unbatched update.
> >>
> >>> Is there a reason not to do the first thing, allowing batching of
> >>> inserts during cross-partition updates? I tried to do that, but it
> >>> dawned on me that we can't mix batched and un-batched operations, e.g.
> >>> DELETE + INSERT, because that'd break the order of execution, leading to
> >>> bogus results in case the same row is modified repeatedly, etc.
> >>
> >> Actually, postgres_fdw only supports moving a row into a partition (as
> >> part of a cross-partition update that is) if it has already finished
> >> performing any updates on it.   So there is no worry of rows that are
> >> moved into a partition subsequently getting updated due to the
> >> original command.
> >>
> >> The attached patch implements the changes necessary to make these
> >> inserts use batching too.
> >>
> >> [1] 
> >> https://www.postgresql.org/message-id/CA%2BHiwqEbnhwVJMsukTP-S9Kv1ynC7Da3yuqSPZC0Y7oWWOwoHQ%40mail.gmail.com
> >
> > Oops, I had mistakenly not hit "Reply All".  Attaching the patch again.
> >
>
> Thanks. The patch seems reasonable, but it's a bit too large for a fix,
> so I'll go ahead and push one of the previous fixes restricting batching
> to plain INSERT commands. But this seems useful, so I suggest adding it
> to the next commitfest.

Okay will post the rebased patch to a new thread.

> One thing that surprised me is that we only move the rows *to* the
> foreign partition, not from it (even on pg13, i.e. before the batching
> etc.). I mean, using the example you posted earlier, with one foreign
> and one local partition, consider this:
>
>   delete from p;
>   insert into p values (2);
>
>   test=# select * from p2;
>a
>   ---
>2
>   (1 row)
>
>   test=# u

Re: POC: postgres_fdw insert batching

2021-02-17 Thread Amit Langote

On Thu, Feb 18, 2021 at 8:38 AM Tomas Vondra
 wrote:
> On 2/17/21 8:36 PM, Tomas Vondra wrote:
> > Thanks. The patch seems reasonable, but it's a bit too large for a fix,
> > so I'll go ahead and push one of the previous fixes restricting batching
> > to plain INSERT commands. But this seems useful, so I suggest adding it
> > to the next commitfest.
>
> I've pushed the v4 fix, adding the CMD_INSERT to execPartition.
>
> I think we may need to revise the relationship between FDW and places
> that (may) call GetForeignModifyBatchSize. Currently, these places need
> to be quite well synchronized - in a way, the issue was (partially) due
> to postgres_fdw and core not agreeing on the details.

Agreed.

> In particular, create_foreign_modify sets batch_size for CMD_INSERT and
> leaves it 0 otherwise. So GetForeignModifyBatchSize() returned 0 later,
> triggering an assert.
>
> It's probably better to require GetForeignModifyBatchSize() to always
> return a valid batch size (>= 1). If batching is not supported, just
> return 1.

That makes sense.

How about the attached?

-- 
Amit Langote
EDB: http://www.enterprisedb.com


ForeignModifyBatchSize-sanity.patch
Description: Binary data

Re: POC: postgres_fdw insert batching

2021-02-17 Thread Tomas Vondra

On 2/17/21 8:36 PM, Tomas Vondra wrote:
>
> ...
> 
> Thanks. The patch seems reasonable, but it's a bit too large for a fix,
> so I'll go ahead and push one of the previous fixes restricting batching
> to plain INSERT commands. But this seems useful, so I suggest adding it
> to the next commitfest.

I've pushed the v4 fix, adding the CMD_INSERT to execPartition.

I think we may need to revise the relationship between FDW and places
that (may) call GetForeignModifyBatchSize. Currently, these places need
to be quite well synchronized - in a way, the issue was (partially) due
to postgres_fdw and core not agreeing on the details.

In particular, create_foreign_modify sets batch_size for CMD_INSERT and
leaves it 0 otherwise. So GetForeignModifyBatchSize() returned 0 later,
triggering an assert.

It's probably better to require GetForeignModifyBatchSize() to always
return a valid batch size (>= 1). If batching is not supported, just
return 1.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: POC: postgres_fdw insert batching

2021-02-17 Thread Tomas Vondra

On 2/17/21 9:51 AM, Amit Langote wrote:
> On Wed, Feb 17, 2021 at 5:46 PM Amit Langote  wrote:
>> On Wed, Feb 17, 2021 at 12:04 AM Tomas Vondra
>>  wrote:
>>> On 2/16/21 10:25 AM, Amit Langote wrote:
 On Tue, Feb 16, 2021 at 1:36 AM Tomas Vondra
> Thanks for the patch, it seems fine to me.

 Thanks for checking.

> I wonder it the commit
> message needs some tweaks, though. At the moment it says:
>
>  Prevent FDW insert batching during cross-partition updates
>
> but what the patch seems to be doing is simply initializing the info
> only for CMD_INSERT operations. Which does the trick, but it affects
> everything, i.e. all updates, no? Not just cross-partition updates.

 You're right.  Please check the message in the updated patch.
>>>
>>> Thanks. I'm not sure I understand what "FDW may not be able to handle
>>> both the original update operation and the batched insert operation
>>> being performed at the same time" means. I mean, if we translate the
>>> UPDATE into DELETE+INSERT, then we don't run both the update and insert
>>> at the same time, right? What exactly is the problem with allowing
>>> batching for inserts in cross-partition updates?
>>
>> Sorry, I hadn't shared enough details of my investigations when I
>> originally ran into this.  Such as that I had considered implementing
>> the use of batching for these inserts too but had given up.
>>
>> Now that you mention it, I think I gave a less convincing reason for
>> why we should avoid doing it at all.  Maybe it would have been more
>> right to say that it is the core code, not necessarily the FDWs, that
>> currently fails to deal with the use of batching by the insert
>> component of a cross-partition update.  Those failures could be
>> addressed as I'll describe below.
>>
>> For postgres_fdw, postgresGetForeignModifyBatchSize() could be taught
>> to simply use the PgFdwModifyTable that is installed to handle the
>> insert component of a cross-partition update (one can get that one via
>> aux_fmstate field of the original PgFdwModifyState).  However, even
>> though that's fine for postgres_fdw to do, what worries (had worried)
>> me is that it also results in scribbling on ri_BatchSize that the core
>> code may see to determine what to do with a particular tuple, and I
>> just have to hope that nodeModifyTable.c doesn't end up doing anything
>> unwarranted with the original update based on seeing a non-zero
>> ri_BatchSize.  AFAICS, we are fine on that front.
>>
>> That said, there are some deficiencies in the code that have to be
>> addressed before we can let postgres_fdw do as mentioned above.  For
>> example, the code in ExecModifyTable() that runs after breaking out of
>> the loop to insert any remaining batched tuples appears to miss the
>> tuples batched by such inserts.   Apparently, that is because the
>> ResultRelInfos used by those inserts are not present in
>> es_tuple_routing_result_relations.  Turns out I had forgotten that
>> execPartition.c doesn't add the ResultRelInfos to that list if they
>> are made by ExecInitModifyTable() for the original update operation
>> and simply reused by ExecFindPartition() when tuples were routed to
>> those partitions.  It can be "fixed" by reverting to the original
>> design in Tsunakawa-san's patch where the tuple routing result
>> relations were obtained from the PartitionTupleRouting data structure,
>> which fortunately stores all tuple routing result relations.  (Sorry,
>> I gave wrong advice in [1] in retrospect.)
>>
>>> On a closer look, it seems the problem actually lies in a small
>>> inconsistency between create_foreign_modify and ExecInitRoutingInfo. The
>>> former only set batch_size for CMD_INSERT while the latter called the
>>> BatchSize() for all operations, expecting >= 1 result. So we may either
>>> relax create_foreign_modify and set batch_size for all DML, or make
>>> ExecInitRoutingInfo stricter (which is what the patches here do).
>>
>> I think we should be fine if we make
>> postgresGetForeignModifyBatchSize() use the correct PgFdwModifyState
>> as described above.  We can be sure that we are not mixing the
>> information used by the batched insert with that of the original
>> unbatched update.
>>
>>> Is there a reason not to do the first thing, allowing batching of
>>> inserts during cross-partition updates? I tried to do that, but it
>>> dawned on me that we can't mix batched and un-batched operations, e.g.
>>> DELETE + INSERT, because that'd break the order of execution, leading to
>>> bogus results in case the same row is modified repeatedly, etc.
>>
>> Actually, postgres_fdw only supports moving a row into a partition (as
>> part of a cross-partition update that is) if it has already finished
>> performing any updates on it.   So there is no worry of rows that are
>> moved into a partition subsequently getting updated due to the
>> original command.
>>
>> The attached patch implements the changes necessary

Re: POC: postgres_fdw insert batching

2021-02-17 Thread Amit Langote

On Wed, Feb 17, 2021 at 5:46 PM Amit Langote  wrote:
> On Wed, Feb 17, 2021 at 12:04 AM Tomas Vondra
>  wrote:
> > On 2/16/21 10:25 AM, Amit Langote wrote:
> > > On Tue, Feb 16, 2021 at 1:36 AM Tomas Vondra
> > >> Thanks for the patch, it seems fine to me.
> > >
> > > Thanks for checking.
> > >
> > >> I wonder it the commit
> > >> message needs some tweaks, though. At the moment it says:
> > >>
> > >>  Prevent FDW insert batching during cross-partition updates
> > >>
> > >> but what the patch seems to be doing is simply initializing the info
> > >> only for CMD_INSERT operations. Which does the trick, but it affects
> > >> everything, i.e. all updates, no? Not just cross-partition updates.
> > >
> > > You're right.  Please check the message in the updated patch.
> >
> > Thanks. I'm not sure I understand what "FDW may not be able to handle
> > both the original update operation and the batched insert operation
> > being performed at the same time" means. I mean, if we translate the
> > UPDATE into DELETE+INSERT, then we don't run both the update and insert
> > at the same time, right? What exactly is the problem with allowing
> > batching for inserts in cross-partition updates?
>
> Sorry, I hadn't shared enough details of my investigations when I
> originally ran into this.  Such as that I had considered implementing
> the use of batching for these inserts too but had given up.
>
> Now that you mention it, I think I gave a less convincing reason for
> why we should avoid doing it at all.  Maybe it would have been more
> right to say that it is the core code, not necessarily the FDWs, that
> currently fails to deal with the use of batching by the insert
> component of a cross-partition update.  Those failures could be
> addressed as I'll describe below.
>
> For postgres_fdw, postgresGetForeignModifyBatchSize() could be taught
> to simply use the PgFdwModifyTable that is installed to handle the
> insert component of a cross-partition update (one can get that one via
> aux_fmstate field of the original PgFdwModifyState).  However, even
> though that's fine for postgres_fdw to do, what worries (had worried)
> me is that it also results in scribbling on ri_BatchSize that the core
> code may see to determine what to do with a particular tuple, and I
> just have to hope that nodeModifyTable.c doesn't end up doing anything
> unwarranted with the original update based on seeing a non-zero
> ri_BatchSize.  AFAICS, we are fine on that front.
>
> That said, there are some deficiencies in the code that have to be
> addressed before we can let postgres_fdw do as mentioned above.  For
> example, the code in ExecModifyTable() that runs after breaking out of
> the loop to insert any remaining batched tuples appears to miss the
> tuples batched by such inserts.   Apparently, that is because the
> ResultRelInfos used by those inserts are not present in
> es_tuple_routing_result_relations.  Turns out I had forgotten that
> execPartition.c doesn't add the ResultRelInfos to that list if they
> are made by ExecInitModifyTable() for the original update operation
> and simply reused by ExecFindPartition() when tuples were routed to
> those partitions.  It can be "fixed" by reverting to the original
> design in Tsunakawa-san's patch where the tuple routing result
> relations were obtained from the PartitionTupleRouting data structure,
> which fortunately stores all tuple routing result relations.  (Sorry,
> I gave wrong advice in [1] in retrospect.)
>
> > On a closer look, it seems the problem actually lies in a small
> > inconsistency between create_foreign_modify and ExecInitRoutingInfo. The
> > former only set batch_size for CMD_INSERT while the latter called the
> > BatchSize() for all operations, expecting >= 1 result. So we may either
> > relax create_foreign_modify and set batch_size for all DML, or make
> > ExecInitRoutingInfo stricter (which is what the patches here do).
>
> I think we should be fine if we make
> postgresGetForeignModifyBatchSize() use the correct PgFdwModifyState
> as described above.  We can be sure that we are not mixing the
> information used by the batched insert with that of the original
> unbatched update.
>
> > Is there a reason not to do the first thing, allowing batching of
> > inserts during cross-partition updates? I tried to do that, but it
> > dawned on me that we can't mix batched and un-batched operations, e.g.
> > DELETE + INSERT, because that'd break the order of execution, leading to
> > bogus results in case the same row is modified repeatedly, etc.
>
> Actually, postgres_fdw only supports moving a row into a partition (as
> part of a cross-partition update that is) if it has already finished
> performing any updates on it.   So there is no worry of rows that are
> moved into a partition subsequently getting updated due to the
> original command.
>
> The attached patch implements the changes necessary to make these
> inserts use batching too.
>
> [1] 
> https://www.postgresql.org/mes

Re: POC: postgres_fdw insert batching

2021-02-16 Thread Tomas Vondra





On 2/16/21 10:25 AM, Amit Langote wrote:

On Tue, Feb 16, 2021 at 1:36 AM Tomas Vondra
 wrote:

On 2/5/21 3:52 AM, Amit Langote wrote:

Tsunakwa-san,

On Mon, Jan 25, 2021 at 1:21 PM tsunakawa.ta...@fujitsu.com
 wrote:

From: Amit Langote 

Yes, it can be simplified by using a local join to prevent the update of the 
foreign
partition from being pushed to the remote server, for which my example in the
previous email used a local trigger.  Note that the update of the foreign
partition to be done locally is a prerequisite for this bug to occur.


Thank you, I was aware that UPDATE calls ExecInsert() but forgot about it 
partway.  Good catch (and my bad miss.)


It appears I had missed your reply, sorry.


+   PgFdwModifyState *fmstate = resultRelInfo->ri_FdwState ?
+   (PgFdwModifyState *) 
resultRelInfo->ri_FdwState :
+   NULL;

This can be written as:

+   PgFdwModifyState *fmstate = (PgFdwModifyState *) 
resultRelInfo->ri_FdwState;


Facepalm, yes.

Patch updated.  Thanks for the review.



Thanks for the patch, it seems fine to me.


Thanks for checking.


I wonder it the commit
message needs some tweaks, though. At the moment it says:

 Prevent FDW insert batching during cross-partition updates

but what the patch seems to be doing is simply initializing the info
only for CMD_INSERT operations. Which does the trick, but it affects
everything, i.e. all updates, no? Not just cross-partition updates.


You're right.  Please check the message in the updated patch.



Thanks. I'm not sure I understand what "FDW may not be able to handle 
both the original update operation and the batched insert operation 
being performed at the same time" means. I mean, if we translate the 
UPDATE into DELETE+INSERT, then we don't run both the update and insert 
at the same time, right? What exactly is the problem with allowing 
batching for inserts in cross-partition updates?


On a closer look, it seems the problem actually lies in a small 
inconsistency between create_foreign_modify and ExecInitRoutingInfo. The 
former only set batch_size for CMD_INSERT while the latter called the 
BatchSize() for all operations, expecting >= 1 result. So we may either 
relax create_foreign_modify and set batch_size for all DML, or make 
ExecInitRoutingInfo stricter (which is what the patches here do).


Is there a reason not to do the first thing, allowing batching of 
inserts during cross-partition updates? I tried to do that, but it 
dawned on me that we can't mix batched and un-batched operations, e.g. 
DELETE + INSERT, because that'd break the order of execution, leading to 
bogus results in case the same row is modified repeatedly, etc.


Am I getting this right?


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: POC: postgres_fdw insert batching

2021-02-16 Thread Amit Langote

On Tue, Feb 16, 2021 at 1:36 AM Tomas Vondra
 wrote:
> On 2/5/21 3:52 AM, Amit Langote wrote:
> > Tsunakwa-san,
> >
> > On Mon, Jan 25, 2021 at 1:21 PM tsunakawa.ta...@fujitsu.com
> >  wrote:
> >> From: Amit Langote 
> >>> Yes, it can be simplified by using a local join to prevent the update of 
> >>> the foreign
> >>> partition from being pushed to the remote server, for which my example in 
> >>> the
> >>> previous email used a local trigger.  Note that the update of the foreign
> >>> partition to be done locally is a prerequisite for this bug to occur.
> >>
> >> Thank you, I was aware that UPDATE calls ExecInsert() but forgot about it 
> >> partway.  Good catch (and my bad miss.)
> >
> > It appears I had missed your reply, sorry.
> >
> >> +   PgFdwModifyState *fmstate = resultRelInfo->ri_FdwState ?
> >> +   (PgFdwModifyState 
> >> *) resultRelInfo->ri_FdwState :
> >> +   NULL;
> >>
> >> This can be written as:
> >>
> >> +   PgFdwModifyState *fmstate = (PgFdwModifyState *) 
> >> resultRelInfo->ri_FdwState;
> >
> > Facepalm, yes.
> >
> > Patch updated.  Thanks for the review.
> >
>
> Thanks for the patch, it seems fine to me.

Thanks for checking.

> I wonder it the commit
> message needs some tweaks, though. At the moment it says:
>
> Prevent FDW insert batching during cross-partition updates
>
> but what the patch seems to be doing is simply initializing the info
> only for CMD_INSERT operations. Which does the trick, but it affects
> everything, i.e. all updates, no? Not just cross-partition updates.

You're right.  Please check the message in the updated patch.


--
Amit Langote
EDB: http://www.enterprisedb.com
From b1d470fc764279ba12787271a04015a123d20b4f Mon Sep 17 00:00:00 2001
From: amitlan 
Date: Sat, 23 Jan 2021 16:35:21 +0900
Subject: [PATCH v4] Fix tuple routing to initialize batching only for inserts

Currently, the insert component of a cross-partition update of a
partitioned table (internally implemented as a delete followed by an
insert) inadvertently ends up using batching for the insert. That may
pose a problem if the insert target is a foreign partition, because
the partition's FDW may not be able to handle both the original update
operation and the batched insert operation being performed at the same
time.  So tighten up the check in ExecInitRoutingInfo() to initialize
batching only if the query's original operation is also INSERT.
---
 .../postgres_fdw/expected/postgres_fdw.out| 23 ++-
 contrib/postgres_fdw/postgres_fdw.c   | 13 +--
 contrib/postgres_fdw/sql/postgres_fdw.sql | 19 ++-
 src/backend/executor/execPartition.c  |  3 ++-
 4 files changed, 53 insertions(+), 5 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 60c7e115d6..3326f1b542 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -9414,5 +9414,26 @@ SELECT COUNT(*) FROM batch_table;
 66
 (1 row)
 
+-- Check that enabling batched inserts doesn't interfere with cross-partition
+-- updates
+CREATE TABLE batch_cp_upd_test (a int) PARTITION BY LIST (a);
+CREATE TABLE batch_cp_upd_test1 (LIKE batch_cp_upd_test);
+CREATE FOREIGN TABLE batch_cp_upd_test1_f
+	PARTITION OF batch_cp_upd_test
+	FOR VALUES IN (1)
+	SERVER loopback
+	OPTIONS (table_name 'batch_cp_upd_test1', batch_size '10');
+CREATE TABLE batch_cp_up_test1 PARTITION OF batch_cp_upd_test
+	FOR VALUES IN (2);
+INSERT INTO batch_cp_upd_test VALUES (1), (2);
+-- The following moves a row from the local partition to the foreign one
+UPDATE batch_cp_upd_test t SET a = 1 FROM (VALUES (1), (2)) s(a) WHERE t.a = s.a;
+SELECT tableoid::regclass, * FROM batch_cp_upd_test;
+   tableoid   | a 
+--+---
+ batch_cp_upd_test1_f | 1
+ batch_cp_upd_test1_f | 1
+(2 rows)
+
 -- Clean up
-DROP TABLE batch_table CASCADE;
+DROP TABLE batch_table, batch_cp_upd_test CASCADE;
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 368997d9d1..35b48575c5 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -1934,17 +1934,26 @@ static int
 postgresGetForeignModifyBatchSize(ResultRelInfo *resultRelInfo)
 {
 	int	batch_size;
+	PgFdwModifyState *fmstate = resultRelInfo->ri_FdwState ?
+			(PgFdwModifyState *) resultRelInfo->ri_FdwState :
+			NULL;
 
 	/* should be called only once */
 	Assert(resultRelInfo->ri_BatchSize == 0);
 
+	/*
+	 * Should never get called when the insert is being performed as part of
+	 * a row movement operation.
+	 */
+	Assert(fmstate == NULL || fmstate->aux_fmstate == NULL);
+
 	/*
 	 * In EXPLAIN without ANALYZE, ri_fdwstate is NULL, so we have to lookup
 	 * the option directly in server/table options. Otherwise just use the
 	 * value we determ

Re: POC: postgres_fdw insert batching

2021-02-15 Thread Tomas Vondra

On 2/5/21 3:52 AM, Amit Langote wrote:
> Tsunakwa-san,
> 
> On Mon, Jan 25, 2021 at 1:21 PM tsunakawa.ta...@fujitsu.com
>  wrote:
>> From: Amit Langote 
>>> Yes, it can be simplified by using a local join to prevent the update of 
>>> the foreign
>>> partition from being pushed to the remote server, for which my example in 
>>> the
>>> previous email used a local trigger.  Note that the update of the foreign
>>> partition to be done locally is a prerequisite for this bug to occur.
>>
>> Thank you, I was aware that UPDATE calls ExecInsert() but forgot about it 
>> partway.  Good catch (and my bad miss.)
> 
> It appears I had missed your reply, sorry.
> 
>> +   PgFdwModifyState *fmstate = resultRelInfo->ri_FdwState ?
>> +   (PgFdwModifyState *) 
>> resultRelInfo->ri_FdwState :
>> +   NULL;
>>
>> This can be written as:
>>
>> +   PgFdwModifyState *fmstate = (PgFdwModifyState *) 
>> resultRelInfo->ri_FdwState;
> 
> Facepalm, yes.
> 
> Patch updated.  Thanks for the review.
> 

Thanks for the patch, it seems fine to me. I wonder it the commit
message needs some tweaks, though. At the moment it says:

Prevent FDW insert batching during cross-partition updates

but what the patch seems to be doing is simply initializing the info
only for CMD_INSERT operations. Which does the trick, but it affects
everything, i.e. all updates, no? Not just cross-partition updates.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: POC: postgres_fdw insert batching

2021-02-15 Thread Tomas Vondra

On 2/5/21 2:55 AM, Ian Lawrence Barwick wrote:
> ...
> 
> There's a minor typo in the doc's version of the
> ExecForeignBatchInsert() declaration;
> is:
> 
>     TupleTableSlot **
>     ExecForeignBatchInsert(EState *estate,
>                       ResultRelInfo *rinfo,
>                       TupleTableSlot **slots,
>                       TupleTableSlot *planSlots,
>                       int *numSlots);
> 
> should be:
> 
>     TupleTableSlot **
>     ExecForeignBatchInsert(EState *estate,
>                       ResultRelInfo *rinfo,
>                       TupleTableSlot **slots,
>                       TupleTableSlot **planSlots,
>                       int *numSlots);
> 
> (Trivial patch attached).
> 
> 
> Forgot to mention the relevant doc link:
> 
>    
> https://www.postgresql.org/docs/devel/fdw-callbacks.html#FDW-CALLBACKS-UPDATE
> 
> 

Thanks, I'll get this fixed.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

RE: POC: postgres_fdw insert batching

2021-02-04 Thread tsunakawa.ta...@fujitsu.com

From: Amit Langote 
> It appears I had missed your reply, sorry.
> 
> > +   PgFdwModifyState *fmstate = resultRelInfo->ri_FdwState ?
> > +
> (PgFdwModifyState *) resultRelInfo->ri_FdwState :
> > +   NULL;
> >
> > This can be written as:
> >
> > +   PgFdwModifyState *fmstate = (PgFdwModifyState *)
> > + resultRelInfo->ri_FdwState;
> 
> Facepalm, yes.
> 
> Patch updated.  Thanks for the review.

Thank you for picking this up.


Regards
Takayuki Tsunakawa

Re: POC: postgres_fdw insert batching

2021-02-04 Thread Amit Langote

Tsunakwa-san,

On Mon, Jan 25, 2021 at 1:21 PM tsunakawa.ta...@fujitsu.com
 wrote:
> From: Amit Langote 
> > Yes, it can be simplified by using a local join to prevent the update of 
> > the foreign
> > partition from being pushed to the remote server, for which my example in 
> > the
> > previous email used a local trigger.  Note that the update of the foreign
> > partition to be done locally is a prerequisite for this bug to occur.
>
> Thank you, I was aware that UPDATE calls ExecInsert() but forgot about it 
> partway.  Good catch (and my bad miss.)

It appears I had missed your reply, sorry.

> +   PgFdwModifyState *fmstate = resultRelInfo->ri_FdwState ?
> +   (PgFdwModifyState *) 
> resultRelInfo->ri_FdwState :
> +   NULL;
>
> This can be written as:
>
> +   PgFdwModifyState *fmstate = (PgFdwModifyState *) 
> resultRelInfo->ri_FdwState;

Facepalm, yes.

Patch updated.  Thanks for the review.

-- 
Amit Langote
EDB: http://www.enterprisedb.com


v3-0001-Prevent-FDW-insert-batching-during-cross-partitio.patch
Description: Binary data

Re: POC: postgres_fdw insert batching

2021-02-04 Thread Ian Lawrence Barwick

2021年1月22日(金) 14:50 Ian Lawrence Barwick :

> Hi
>
> 2021年1月21日(木) 8:00 Tomas Vondra :
>
>> OK, pushed after a little bit of additional polishing (mostly comments).
>>
>> Thanks everyone!
>>
>
> There's a minor typo in the doc's version of the ExecForeignBatchInsert()
> declaration;
> is:
>
> TupleTableSlot **
> ExecForeignBatchInsert(EState *estate,
>   ResultRelInfo *rinfo,
>   TupleTableSlot **slots,
>   TupleTableSlot *planSlots,
>   int *numSlots);
>
> should be:
>
> TupleTableSlot **
> ExecForeignBatchInsert(EState *estate,
>   ResultRelInfo *rinfo,
>   TupleTableSlot **slots,
>   TupleTableSlot **planSlots,
>   int *numSlots);
>
> (Trivial patch attached).
>
>
Forgot to mention the relevant doc link:


https://www.postgresql.org/docs/devel/fdw-callbacks.html#FDW-CALLBACKS-UPDATE

Regards

Ian Barwick

-- 
EnterpriseDB: https://www.enterprisedb.com

Re: POC: postgres_fdw insert batching

2021-02-04 Thread Ian Lawrence Barwick

2021年1月22日(金) 14:50 Ian Lawrence Barwick :

> Hi
>
> 2021年1月21日(木) 8:00 Tomas Vondra :
>
>> OK, pushed after a little bit of additional polishing (mostly comments).
>>
>> Thanks everyone!
>>
>
> There's a minor typo in the doc's version of the ExecForeignBatchInsert()
> declaration;
> is:
>
> TupleTableSlot **
> ExecForeignBatchInsert(EState *estate,
>   ResultRelInfo *rinfo,
>   TupleTableSlot **slots,
>   TupleTableSlot *planSlots,
>   int *numSlots);
>
> should be:
>
> TupleTableSlot **
> ExecForeignBatchInsert(EState *estate,
>   ResultRelInfo *rinfo,
>   TupleTableSlot **slots,
>   TupleTableSlot **planSlots,
>   int *numSlots);
>
> (Trivial patch attached).
>
>
> Regards
>
> Ian Barwick
>
> --
> EnterpriseDB: https://www.enterprisedb.com
>
>

-- 
EnterpriseDB: https://www.enterprisedb.com

RE: POC: postgres_fdw insert batching

2021-01-24 Thread tsunakawa.ta...@fujitsu.com

From: Amit Langote 
> Yes, it can be simplified by using a local join to prevent the update of the 
> foreign
> partition from being pushed to the remote server, for which my example in the
> previous email used a local trigger.  Note that the update of the foreign
> partition to be done locally is a prerequisite for this bug to occur.

Thank you, I was aware that UPDATE calls ExecInsert() but forgot about it 
partway.  Good catch (and my bad miss.)


+   PgFdwModifyState *fmstate = resultRelInfo->ri_FdwState ?
+   (PgFdwModifyState *) 
resultRelInfo->ri_FdwState :
+   NULL;

This can be written as:

+   PgFdwModifyState *fmstate = (PgFdwModifyState *) 
resultRelInfo->ri_FdwState;


Regards
Takayuki Tsunakawa

Re: POC: postgres_fdw insert batching

2021-01-24 Thread Amit Langote

On Sun, Jan 24, 2021 at 2:17 AM Tomas Vondra
 wrote:
> On 1/23/21 9:31 AM, Amit Langote wrote:
> > I was looking at this and it looks like we've got a problematic case
> > where postgresGetForeignModifyBatchSize() is called from
> > ExecInitRoutingInfo().
> >
> > That case is when the insert is performed as part of a cross-partition
> > update of a partitioned table containing postgres_fdw foreign table
> > partitions.  Because we don't check the operation in
> > ExecInitRoutingInfo() when calling GetForeignModifyBatchSize(), such
> > inserts attempt to use batching.  However the ResultRelInfo may be one
> > for the original update operation, so ri_FdwState contains a
> > PgFdwModifyState with batch_size set to 0, because updates don't
> > support batching.  As things stand now,
> > postgresGetForeignModifyBatchSize() simply returns that, tripping the
> > following Assert in the caller.
> >
> > Assert(partRelInfo->ri_BatchSize >= 1);
> >
> > Use this example to see the crash:
> >
> > create table p (a int) partition by list (a);
> > create table p1 (like p);
> > create extension postgres_fdw;
> > create server lb foreign data wrapper postgres_fdw ;
> > create user mapping for current_user server lb;
> > create foreign table fp1 (a int) server lb options (table_name 'p1');
> > alter table p attach partition fp1 for values in (1);
> > create or replace function report_trig_details() returns trigger as $$
> > begin raise notice '% % on %', tg_when, tg_op, tg_relname; if tg_op =
> > 'DELETE' then return old; end if; return new; end; $$ language
> > plpgsql;
> > create trigger trig before update on fp1 for each row execute function
> > report_trig_details();
> > create table p2 partition of p for values in (2);
> > insert into p values (2);
> > update p set a = 1;  -- crashes
> >
> > So we let's check mtstate->operation == CMD_INSERT in
> > ExecInitRoutingInfo() to prevent calling GetForeignModifyBatchSize()
> > in cross-update situations where mtstate->operation would be
> > CMD_UPDATE.
> >
> > I've attached a patch.
>
> Thanks for catching this. I think it'd be good if the fix included a
> regression test. The example seems like a good starting point, not sure
> if it can be simplified further.

Yes, it can be simplified by using a local join to prevent the update
of the foreign partition from being pushed to the remote server, for
which my example in the previous email used a local trigger.  Note
that the update of the foreign partition to be done locally is a
prerequisite for this bug to occur.

I've added that simplified test case in the attached updated patch.

-- 
Amit Langote
EDB: http://www.enterprisedb.com


v2-0001-Prevent-FDW-insert-batching-during-cross-partitio.patch
Description: Binary data

Re: POC: postgres_fdw insert batching

2021-01-23 Thread Tomas Vondra





On 1/23/21 9:31 AM, Amit Langote wrote:

On Thu, Jan 21, 2021 at 11:36 AM Tomas Vondra
 wrote:

On 1/21/21 3:09 AM, tsunakawa.ta...@fujitsu.com wrote:

From: Tomas Vondra 

Right, that's pretty much what I ended up doing (without the CMD_INSERT
check it'd add batching info to explain for updates too, for example).
I'll do a bit more testing on the attached patch, but I think that's the right 
fix to
push.


Thanks to the outer check for operation ==  CMD_INSERT, the inner one became 
unnecessary.



Right. I've pushed the fix, hopefully buildfarm will get happy again.


I was looking at this and it looks like we've got a problematic case
where postgresGetForeignModifyBatchSize() is called from
ExecInitRoutingInfo().

That case is when the insert is performed as part of a cross-partition
update of a partitioned table containing postgres_fdw foreign table
partitions.  Because we don't check the operation in
ExecInitRoutingInfo() when calling GetForeignModifyBatchSize(), such
inserts attempt to use batching.  However the ResultRelInfo may be one
for the original update operation, so ri_FdwState contains a
PgFdwModifyState with batch_size set to 0, because updates don't
support batching.  As things stand now,
postgresGetForeignModifyBatchSize() simply returns that, tripping the
following Assert in the caller.

Assert(partRelInfo->ri_BatchSize >= 1);

Use this example to see the crash:

create table p (a int) partition by list (a);
create table p1 (like p);
create extension postgres_fdw;
create server lb foreign data wrapper postgres_fdw ;
create user mapping for current_user server lb;
create foreign table fp1 (a int) server lb options (table_name 'p1');
alter table p attach partition fp1 for values in (1);
create or replace function report_trig_details() returns trigger as $$
begin raise notice '% % on %', tg_when, tg_op, tg_relname; if tg_op =
'DELETE' then return old; end if; return new; end; $$ language
plpgsql;
create trigger trig before update on fp1 for each row execute function
report_trig_details();
create table p2 partition of p for values in (2);
insert into p values (2);
update p set a = 1;  -- crashes

So we let's check mtstate->operation == CMD_INSERT in
ExecInitRoutingInfo() to prevent calling GetForeignModifyBatchSize()
in cross-update situations where mtstate->operation would be
CMD_UPDATE.

I've attached a patch.



Thanks for catching this. I think it'd be good if the fix included a 
regression test. The example seems like a good starting point, not sure 
if it can be simplified further.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: POC: postgres_fdw insert batching

2021-01-23 Thread Zhihong Yu

Amit:
Good catch.

bq. ExecInitRoutingInfo() that is in the charge of initialing

Should be 'ExecInitRoutingInfo() that is in charge of initializing'

Cheers

On Sat, Jan 23, 2021 at 12:31 AM Amit Langote 
wrote:

> On Thu, Jan 21, 2021 at 11:36 AM Tomas Vondra
>  wrote:
> > On 1/21/21 3:09 AM, tsunakawa.ta...@fujitsu.com wrote:
> > > From: Tomas Vondra 
> > >> Right, that's pretty much what I ended up doing (without the
> CMD_INSERT
> > >> check it'd add batching info to explain for updates too, for example).
> > >> I'll do a bit more testing on the attached patch, but I think that's
> the right fix to
> > >> push.
> > >
> > > Thanks to the outer check for operation ==  CMD_INSERT, the inner one
> became unnecessary.
> > >
> >
> > Right. I've pushed the fix, hopefully buildfarm will get happy again.
>
> I was looking at this and it looks like we've got a problematic case
> where postgresGetForeignModifyBatchSize() is called from
> ExecInitRoutingInfo().
>
> That case is when the insert is performed as part of a cross-partition
> update of a partitioned table containing postgres_fdw foreign table
> partitions.  Because we don't check the operation in
> ExecInitRoutingInfo() when calling GetForeignModifyBatchSize(), such
> inserts attempt to use batching.  However the ResultRelInfo may be one
> for the original update operation, so ri_FdwState contains a
> PgFdwModifyState with batch_size set to 0, because updates don't
> support batching.  As things stand now,
> postgresGetForeignModifyBatchSize() simply returns that, tripping the
> following Assert in the caller.
>
> Assert(partRelInfo->ri_BatchSize >= 1);
>
> Use this example to see the crash:
>
> create table p (a int) partition by list (a);
> create table p1 (like p);
> create extension postgres_fdw;
> create server lb foreign data wrapper postgres_fdw ;
> create user mapping for current_user server lb;
> create foreign table fp1 (a int) server lb options (table_name 'p1');
> alter table p attach partition fp1 for values in (1);
> create or replace function report_trig_details() returns trigger as $$
> begin raise notice '% % on %', tg_when, tg_op, tg_relname; if tg_op =
> 'DELETE' then return old; end if; return new; end; $$ language
> plpgsql;
> create trigger trig before update on fp1 for each row execute function
> report_trig_details();
> create table p2 partition of p for values in (2);
> insert into p values (2);
> update p set a = 1;  -- crashes
>
> So we let's check mtstate->operation == CMD_INSERT in
> ExecInitRoutingInfo() to prevent calling GetForeignModifyBatchSize()
> in cross-update situations where mtstate->operation would be
> CMD_UPDATE.
>
> I've attached a patch.
>
>
>
>
> --
> Amit Langote
> EDB: http://www.enterprisedb.com
>

Re: POC: postgres_fdw insert batching

2021-01-23 Thread Amit Langote

On Thu, Jan 21, 2021 at 11:36 AM Tomas Vondra
 wrote:
> On 1/21/21 3:09 AM, tsunakawa.ta...@fujitsu.com wrote:
> > From: Tomas Vondra 
> >> Right, that's pretty much what I ended up doing (without the CMD_INSERT
> >> check it'd add batching info to explain for updates too, for example).
> >> I'll do a bit more testing on the attached patch, but I think that's the 
> >> right fix to
> >> push.
> >
> > Thanks to the outer check for operation ==  CMD_INSERT, the inner one 
> > became unnecessary.
> >
>
> Right. I've pushed the fix, hopefully buildfarm will get happy again.

I was looking at this and it looks like we've got a problematic case
where postgresGetForeignModifyBatchSize() is called from
ExecInitRoutingInfo().

That case is when the insert is performed as part of a cross-partition
update of a partitioned table containing postgres_fdw foreign table
partitions.  Because we don't check the operation in
ExecInitRoutingInfo() when calling GetForeignModifyBatchSize(), such
inserts attempt to use batching.  However the ResultRelInfo may be one
for the original update operation, so ri_FdwState contains a
PgFdwModifyState with batch_size set to 0, because updates don't
support batching.  As things stand now,
postgresGetForeignModifyBatchSize() simply returns that, tripping the
following Assert in the caller.

Assert(partRelInfo->ri_BatchSize >= 1);

Use this example to see the crash:

create table p (a int) partition by list (a);
create table p1 (like p);
create extension postgres_fdw;
create server lb foreign data wrapper postgres_fdw ;
create user mapping for current_user server lb;
create foreign table fp1 (a int) server lb options (table_name 'p1');
alter table p attach partition fp1 for values in (1);
create or replace function report_trig_details() returns trigger as $$
begin raise notice '% % on %', tg_when, tg_op, tg_relname; if tg_op =
'DELETE' then return old; end if; return new; end; $$ language
plpgsql;
create trigger trig before update on fp1 for each row execute function
report_trig_details();
create table p2 partition of p for values in (2);
insert into p values (2);
update p set a = 1;  -- crashes

So we let's check mtstate->operation == CMD_INSERT in
ExecInitRoutingInfo() to prevent calling GetForeignModifyBatchSize()
in cross-update situations where mtstate->operation would be
CMD_UPDATE.

I've attached a patch.

--
Amit Langote
EDB: http://www.enterprisedb.com

0001-Prevent-FDW-insert-batching-during-cross-partition-u.patch
Description: Binary data

Re: POC: postgres_fdw insert batching

2021-01-21 Thread Ian Lawrence Barwick

Hi

2021年1月21日(木) 8:00 Tomas Vondra :

> OK, pushed after a little bit of additional polishing (mostly comments).
>
> Thanks everyone!
>

There's a minor typo in the doc's version of the ExecForeignBatchInsert()
declaration;
is:

TupleTableSlot **
ExecForeignBatchInsert(EState *estate,
  ResultRelInfo *rinfo,
  TupleTableSlot **slots,
  TupleTableSlot *planSlots,
  int *numSlots);

should be:

TupleTableSlot **
ExecForeignBatchInsert(EState *estate,
  ResultRelInfo *rinfo,
  TupleTableSlot **slots,
  TupleTableSlot **planSlots,
  int *numSlots);

(Trivial patch attached).


Regards

Ian Barwick

-- 
EnterpriseDB: https://www.enterprisedb.com
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 854913ae5f..2e73d296d2 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -619,7 +619,7 @@ TupleTableSlot **
 ExecForeignBatchInsert(EState *estate,
   ResultRelInfo *rinfo,
   TupleTableSlot **slots,
-  TupleTableSlot *planSlots,
+  TupleTableSlot **planSlots,
   int *numSlots);

Re: POC: postgres_fdw insert batching

2021-01-20 Thread Tomas Vondra


On 1/21/21 3:09 AM, tsunakawa.ta...@fujitsu.com wrote:

From: Tomas Vondra 

Right, that's pretty much what I ended up doing (without the CMD_INSERT
check it'd add batching info to explain for updates too, for example).
I'll do a bit more testing on the attached patch, but I think that's the right 
fix to
push.


Thanks to the outer check for operation ==  CMD_INSERT, the inner one became 
unnecessary.



Right. I've pushed the fix, hopefully buildfarm will get happy again.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

RE: POC: postgres_fdw insert batching

2021-01-20 Thread tsunakawa.ta...@fujitsu.com

From: Tom Lane 
> The "single target table" could be partitioned, in which case there'll be
> multiple resultrelinfos, some of which could be foreign tables.

Thank you.  I thought so at first, but later I found that ExecInsert() only 
handles one element in mtstate->resultRelInfo.  So I thought just the first 
element is processed in INSERT case.

I understood (guessed) the for loop is for UPDATE and DELETE.  EXPLAIN without 
ANALYZE UPDATE/DELETE on a partitioned table shows partitions, which would be 
mtstate->resultRelInfo.  EXPLAIN on INSERT doesn't show partitions, so I think 
INSERT will find relevant partitions based on input rows during execution.


Regards
Takayuki Tsunakawa

RE: POC: postgres_fdw insert batching

2021-01-20 Thread tsunakawa.ta...@fujitsu.com

From: Tomas Vondra 
> Right, that's pretty much what I ended up doing (without the CMD_INSERT
> check it'd add batching info to explain for updates too, for example).
> I'll do a bit more testing on the attached patch, but I think that's the 
> right fix to
> push.

Thanks to the outer check for operation ==  CMD_INSERT, the inner one became 
unnecessary.


Regards
Takayuki Tsunakawa

Re: POC: postgres_fdw insert batching

2021-01-20 Thread Tom Lane

"tsunakawa.ta...@fujitsu.com"  writes:
> Just for learning, could anyone tell me what this loop for?  I thought 
> current Postgres's DML supports a single target table, so it's enough to 
> handle the first element of mtstate->resultRelInfo.

The "single target table" could be partitioned, in which case there'll be
multiple resultrelinfos, some of which could be foreign tables.

regards, tom lane

Re: POC: postgres_fdw insert batching

2021-01-20 Thread Tomas Vondra




On 1/21/21 2:53 AM, Amit Langote wrote:

On Thu, Jan 21, 2021 at 10:42 AM Tomas Vondra
 wrote:

On 1/21/21 2:24 AM, Amit Langote wrote:

On Thu, Jan 21, 2021 at 9:56 AM Tomas Vondra
 wrote:

On 1/21/21 1:17 AM, Zhihong Yu wrote:

Hi,
The assignment to resultRelInfo is done when junk_filter_needed is true:

   if (junk_filter_needed)
   {
   resultRelInfo = mtstate->resultRelInfo;

Should the code for determining batch size access mtstate->resultRelInfo
directly ?



IMO the issue is that code iterates over all plans and moves to the next
for each one:

   resultRelInfo++;

so it ends up pointing past the last element, hence the failures. So
yeah, either the code needs to move before the loop (per my patch), or
we need to access mtstate->resultRelInfo directly.


Accessing mtstate->resultRelInfo directly would do.  The only
constraint on where this block should be placed is that
ri_projectReturning must be valid as of calling
GetForeignModifyBatchSize(), as Tsunakawa-san pointed out upthread.
So, after this block in ExecInitModifyTable:

  /*
   * Initialize RETURNING projections if needed.
   */
  if (node->returningLists)
  {
  
  /*
   * Build a projection for each result rel.
   */
  resultRelInfo = mtstate->resultRelInfo;
  foreach(l, node->returningLists)
  {
  List   *rlist = (List *) lfirst(l);

  resultRelInfo->ri_returningList = rlist;
  resultRelInfo->ri_projectReturning =
  ExecBuildProjectionInfo(rlist, econtext, slot, &mtstate->ps,
  
resultRelInfo->ri_RelationDesc->rd_att);
  resultRelInfo++;
  }
  }



Right. But I think Tom is right this should initialize ri_BatchSize for
all the resultRelInfo elements, not just the first one. Per the attached
patch, which resolves the issue both on x86_64 and armv7l for me.


+1 in general.  To avoid looping uselessly in the case of
UPDATE/DELETE where batching can't be used today, I'd suggest putting
if (operation == CMD_INSERT) around the loop.



Right, that's pretty much what I ended up doing (without the CMD_INSERT 
check it'd add batching info to explain for updates too, for example). 
I'll do a bit more testing on the attached patch, but I think that's the 
right fix to push.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 9c36860704..a4870d621a 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -2797,18 +2797,30 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	 * Determine if the FDW supports batch insert and determine the batch
 	 * size (a FDW may support batching, but it may be disabled for the
 	 * server/table).
+	 *
+	 * We only do this for INSERT, so that for UPDATE/DELETE the value
+	 * remains set to 0.
 	 */
-	if (!resultRelInfo->ri_usesFdwDirectModify &&
-		operation == CMD_INSERT &&
-		resultRelInfo->ri_FdwRoutine != NULL &&
-		resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
-		resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
-		resultRelInfo->ri_BatchSize =
-			resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);
-	else
-		resultRelInfo->ri_BatchSize = 1;
+	if (operation == CMD_INSERT)
+	{
+		resultRelInfo = mtstate->resultRelInfo;
+		for (i = 0; i < nplans; i++)
+		{
+			if (!resultRelInfo->ri_usesFdwDirectModify &&
+operation == CMD_INSERT &&
+resultRelInfo->ri_FdwRoutine != NULL &&
+resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
+resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
+resultRelInfo->ri_BatchSize =
+	resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);
+			else
+resultRelInfo->ri_BatchSize = 1;
+
+			Assert(resultRelInfo->ri_BatchSize >= 1);
 
-	Assert(resultRelInfo->ri_BatchSize >= 1);
+			resultRelInfo++;
+		}
+	}
 
 	/*
 	 * Lastly, if this is not the primary (canSetTag) ModifyTable node, add it

RE: POC: postgres_fdw insert batching

2021-01-20 Thread tsunakawa.ta...@fujitsu.com

From: Zhihong Yu 
> My first name is Zhihong.

> You can call me Ted if you want to save some typing :-)

Ah, I'm very sorry.  Thank you, let me call you Ted then.  That can't be 
mistaken.

Regards
Takayuki Tsunakawa

Re: POC: postgres_fdw insert batching

2021-01-20 Thread Amit Langote

On Thu, Jan 21, 2021 at 10:42 AM Tomas Vondra
 wrote:
> On 1/21/21 2:24 AM, Amit Langote wrote:
> > On Thu, Jan 21, 2021 at 9:56 AM Tomas Vondra
> >  wrote:
> >> On 1/21/21 1:17 AM, Zhihong Yu wrote:
> >>> Hi,
> >>> The assignment to resultRelInfo is done when junk_filter_needed is true:
> >>>
> >>>   if (junk_filter_needed)
> >>>   {
> >>>   resultRelInfo = mtstate->resultRelInfo;
> >>>
> >>> Should the code for determining batch size access mtstate->resultRelInfo
> >>> directly ?
> >>>
> >>
> >> IMO the issue is that code iterates over all plans and moves to the next
> >> for each one:
> >>
> >>   resultRelInfo++;
> >>
> >> so it ends up pointing past the last element, hence the failures. So
> >> yeah, either the code needs to move before the loop (per my patch), or
> >> we need to access mtstate->resultRelInfo directly.
> >
> > Accessing mtstate->resultRelInfo directly would do.  The only
> > constraint on where this block should be placed is that
> > ri_projectReturning must be valid as of calling
> > GetForeignModifyBatchSize(), as Tsunakawa-san pointed out upthread.
> > So, after this block in ExecInitModifyTable:
> >
> >  /*
> >   * Initialize RETURNING projections if needed.
> >   */
> >  if (node->returningLists)
> >  {
> >  
> >  /*
> >   * Build a projection for each result rel.
> >   */
> >  resultRelInfo = mtstate->resultRelInfo;
> >  foreach(l, node->returningLists)
> >  {
> >  List   *rlist = (List *) lfirst(l);
> >
> >  resultRelInfo->ri_returningList = rlist;
> >  resultRelInfo->ri_projectReturning =
> >  ExecBuildProjectionInfo(rlist, econtext, slot, 
> > &mtstate->ps,
> >  
> > resultRelInfo->ri_RelationDesc->rd_att);
> >  resultRelInfo++;
> >  }
> >  }
> >
>
> Right. But I think Tom is right this should initialize ri_BatchSize for
> all the resultRelInfo elements, not just the first one. Per the attached
> patch, which resolves the issue both on x86_64 and armv7l for me.

+1 in general.  To avoid looping uselessly in the case of
UPDATE/DELETE where batching can't be used today, I'd suggest putting
if (operation == CMD_INSERT) around the loop.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

RE: POC: postgres_fdw insert batching

2021-01-20 Thread tsunakawa.ta...@fujitsu.com

From: Tomas Vondra 
> Right. But I think Tom is right this should initialize ri_BatchSize for all 
> the
> resultRelInfo elements, not just the first one. Per the attached patch, which
> resolves the issue both on x86_64 and armv7l for me.

I think Your patch is perfect in the sense that it's ready for the future 
multi-target DML support.  +1

Just for learning, could anyone tell me what this loop for?  I thought current 
Postgres's DML supports a single target table, so it's enough to handle the 
first element of mtstate->resultRelInfo.  In that sense, Amit-san and I agreed 
that we don't put the if block in the for loop yet.


Regards
Takayuki Tsunakawa

Re: POC: postgres_fdw insert batching

2021-01-20 Thread Zhihong Yu

Hi, Takayuki-san:
My first name is Zhihong.

You can call me Ted if you want to save some typing :-)

Cheers

On Wed, Jan 20, 2021 at 5:37 PM tsunakawa.ta...@fujitsu.com <
tsunakawa.ta...@fujitsu.com> wrote:

> From: Zhihong Yu 
>
> > Do we need to consider how this part of code inside
> ExecInitModifyTable() would evolve ?
>
>
>
> > I think placing the compound condition toward the end of
> ExecInitModifyTable() is reasonable because it checks the latest
> information.
>
>
>
> +1 for Zaihong-san's idea.  But instead of rewriting every relsultRelInfo
> to mtstate->resultRelInfo, which makes it a bit harder to read, I'd like to
> suggest just adding "resultRelInfo = mtstate->resultRelInfo;" immediately
> before the if block.
>
>
>
> Thanks a lot, all for helping to solve the problem quickly!
>
>
>
>
>
> Regards
>
> Takayuki Tsunakawa
>
>
>
>

Re: POC: postgres_fdw insert batching

2021-01-20 Thread Tomas Vondra





On 1/21/21 2:22 AM, Tom Lane wrote:

Tomas Vondra  writes:

I may be wrong, but the most likely explanation seems to be this is due
to the junk filter initialization, which simply moves past the end of
the mtstate->resultRelInfo array.


resultRelInfo is certainly pointing at garbage at that point.



Yup. It's pretty amazing the x86 machines seem to be mostly OK with it.


It kinda seems the GetForeignModifyBatchSize call should happen before
that block. The attached patch fixes this for me (i.e. regression tests
pass with no valgrind reports.



Or did I get that wrong?


Don't we need to initialize ri_BatchSize for *each* resultrelinfo,
not merely the first one?  That is, this new code needs to be
somewhere inside a loop over the result rels.



Yeah, I think you're right. That's an embarrassing oversight :-(


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: POC: postgres_fdw insert batching

2021-01-20 Thread Tomas Vondra




On 1/21/21 2:24 AM, Amit Langote wrote:

On Thu, Jan 21, 2021 at 9:56 AM Tomas Vondra
 wrote:

On 1/21/21 1:17 AM, Zhihong Yu wrote:

Hi,
The assignment to resultRelInfo is done when junk_filter_needed is true:

  if (junk_filter_needed)
  {
  resultRelInfo = mtstate->resultRelInfo;

Should the code for determining batch size access mtstate->resultRelInfo
directly ?



IMO the issue is that code iterates over all plans and moves to the next
for each one:

  resultRelInfo++;

so it ends up pointing past the last element, hence the failures. So
yeah, either the code needs to move before the loop (per my patch), or
we need to access mtstate->resultRelInfo directly.


Accessing mtstate->resultRelInfo directly would do.  The only
constraint on where this block should be placed is that
ri_projectReturning must be valid as of calling
GetForeignModifyBatchSize(), as Tsunakawa-san pointed out upthread.
So, after this block in ExecInitModifyTable:

 /*
  * Initialize RETURNING projections if needed.
  */
 if (node->returningLists)
 {
 
 /*
  * Build a projection for each result rel.
  */
 resultRelInfo = mtstate->resultRelInfo;
 foreach(l, node->returningLists)
 {
 List   *rlist = (List *) lfirst(l);

 resultRelInfo->ri_returningList = rlist;
 resultRelInfo->ri_projectReturning =
 ExecBuildProjectionInfo(rlist, econtext, slot, &mtstate->ps,
 
resultRelInfo->ri_RelationDesc->rd_att);
 resultRelInfo++;
 }
 }



Right. But I think Tom is right this should initialize ri_BatchSize for 
all the resultRelInfo elements, not just the first one. Per the attached 
patch, which resolves the issue both on x86_64 and armv7l for me.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 9c36860704..10febcae8a 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -2798,17 +2798,23 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 	 * size (a FDW may support batching, but it may be disabled for the
 	 * server/table).
 	 */
-	if (!resultRelInfo->ri_usesFdwDirectModify &&
-		operation == CMD_INSERT &&
-		resultRelInfo->ri_FdwRoutine != NULL &&
-		resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
-		resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
-		resultRelInfo->ri_BatchSize =
-			resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);
-	else
-		resultRelInfo->ri_BatchSize = 1;
+	resultRelInfo = mtstate->resultRelInfo;
+	for (i = 0; i < nplans; i++)
+	{
+		if (!resultRelInfo->ri_usesFdwDirectModify &&
+			operation == CMD_INSERT &&
+			resultRelInfo->ri_FdwRoutine != NULL &&
+			resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
+			resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
+			resultRelInfo->ri_BatchSize =
+resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);
+		else
+			resultRelInfo->ri_BatchSize = 1;
+
+		Assert(resultRelInfo->ri_BatchSize >= 1);
 
-	Assert(resultRelInfo->ri_BatchSize >= 1);
+		resultRelInfo++;
+	}
 
 	/*
 	 * Lastly, if this is not the primary (canSetTag) ModifyTable node, add it

RE: POC: postgres_fdw insert batching

2021-01-20 Thread tsunakawa.ta...@fujitsu.com

From: Zhihong Yu 
> Do we need to consider how this part of code inside ExecInitModifyTable() 
> would evolve ?

> I think placing the compound condition toward the end of 
> ExecInitModifyTable() is reasonable because it checks the latest information.

+1 for Zaihong-san's idea.  But instead of rewriting every relsultRelInfo to 
mtstate->resultRelInfo, which makes it a bit harder to read, I'd like to 
suggest just adding "resultRelInfo = mtstate->resultRelInfo;" immediately 
before the if block.

Thanks a lot, all for helping to solve the problem quickly!


Regards
Takayuki Tsunakawa

Re: POC: postgres_fdw insert batching

2021-01-20 Thread Amit Langote

On Thu, Jan 21, 2021 at 9:56 AM Tomas Vondra
 wrote:
> On 1/21/21 1:17 AM, Zhihong Yu wrote:
> > Hi,
> > The assignment to resultRelInfo is done when junk_filter_needed is true:
> >
> >  if (junk_filter_needed)
> >  {
> >  resultRelInfo = mtstate->resultRelInfo;
> >
> > Should the code for determining batch size access mtstate->resultRelInfo
> > directly ?
> >
>
> IMO the issue is that code iterates over all plans and moves to the next
> for each one:
>
>  resultRelInfo++;
>
> so it ends up pointing past the last element, hence the failures. So
> yeah, either the code needs to move before the loop (per my patch), or
> we need to access mtstate->resultRelInfo directly.

Accessing mtstate->resultRelInfo directly would do.  The only
constraint on where this block should be placed is that
ri_projectReturning must be valid as of calling
GetForeignModifyBatchSize(), as Tsunakawa-san pointed out upthread.
So, after this block in ExecInitModifyTable:

/*
 * Initialize RETURNING projections if needed.
 */
if (node->returningLists)
{

/*
 * Build a projection for each result rel.
 */
resultRelInfo = mtstate->resultRelInfo;
foreach(l, node->returningLists)
{
List   *rlist = (List *) lfirst(l);

resultRelInfo->ri_returningList = rlist;
resultRelInfo->ri_projectReturning =
ExecBuildProjectionInfo(rlist, econtext, slot, &mtstate->ps,
resultRelInfo->ri_RelationDesc->rd_att);
resultRelInfo++;
}
}

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Re: POC: postgres_fdw insert batching

2021-01-20 Thread Tom Lane

Tomas Vondra  writes:
> I may be wrong, but the most likely explanation seems to be this is due 
> to the junk filter initialization, which simply moves past the end of 
> the mtstate->resultRelInfo array.

resultRelInfo is certainly pointing at garbage at that point.

> It kinda seems the GetForeignModifyBatchSize call should happen before 
> that block. The attached patch fixes this for me (i.e. regression tests 
> pass with no valgrind reports.

> Or did I get that wrong?

Don't we need to initialize ri_BatchSize for *each* resultrelinfo,
not merely the first one?  That is, this new code needs to be
somewhere inside a loop over the result rels.

regards, tom lane

Re: POC: postgres_fdw insert batching

2021-01-20 Thread Zhihong Yu

Hi,
Do we need to consider how this part of code inside ExecInitModifyTable()
would evolve ?

I think placing the compound condition toward the end
of ExecInitModifyTable() is reasonable because it checks the latest
information.

Regards

On Wed, Jan 20, 2021 at 5:11 PM Tomas Vondra 
wrote:

>
>
> On 1/21/21 2:02 AM, Zhihong Yu wrote:
> > Hi, Tomas:
> > In my opinion, my patch is a little better.
> > Suppose one of the conditions in the if block changes in between the
> > start of loop and the end of the loop:
> >
> >   * Determine if the FDW supports batch insert and determine the
> batch
> >   * size (a FDW may support batching, but it may be disabled for the
> >   * server/table).
> >
> > My patch would reflect that change. I guess this was the reason the if /
> > else block was placed there in the first place.
> >
>
> But can it change? All the loop does is extracting junk attributes from
> the plans, it does not modify anything related to the batching. Or maybe
> I just don't understand what you mean.
>
>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

Re: POC: postgres_fdw insert batching

2021-01-20 Thread Tomas Vondra





On 1/21/21 2:02 AM, Zhihong Yu wrote:

Hi, Tomas:
In my opinion, my patch is a little better.
Suppose one of the conditions in the if block changes in between the 
start of loop and the end of the loop:


      * Determine if the FDW supports batch insert and determine the batch
      * size (a FDW may support batching, but it may be disabled for the
      * server/table).

My patch would reflect that change. I guess this was the reason the if / 
else block was placed there in the first place.




But can it change? All the loop does is extracting junk attributes from 
the plans, it does not modify anything related to the batching. Or maybe 
I just don't understand what you mean.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: POC: postgres_fdw insert batching

2021-01-20 Thread Zhihong Yu

Hi, Tomas:
In my opinion, my patch is a little better.
Suppose one of the conditions in the if block changes in between the start
of loop and the end of the loop:

 * Determine if the FDW supports batch insert and determine the batch
 * size (a FDW may support batching, but it may be disabled for the
 * server/table).

My patch would reflect that change. I guess this was the reason the if /
else block was placed there in the first place.

Cheers

On Wed, Jan 20, 2021 at 4:56 PM Tomas Vondra 
wrote:

>
>
> On 1/21/21 1:17 AM, Zhihong Yu wrote:
> > Hi,
> > The assignment to resultRelInfo is done when junk_filter_needed is true:
> >
> >  if (junk_filter_needed)
> >  {
> >  resultRelInfo = mtstate->resultRelInfo;
> >
> > Should the code for determining batch size access mtstate->resultRelInfo
> > directly ?
> >
>
> IMO the issue is that code iterates over all plans and moves to the next
> for each one:
>
>  resultRelInfo++;
>
> so it ends up pointing past the last element, hence the failures. So
> yeah, either the code needs to move before the loop (per my patch), or
> we need to access mtstate->resultRelInfo directly.
>
> I'm pretty amazed this did not crash during any of the many regression
> runs I did recently.
>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

Re: POC: postgres_fdw insert batching

2021-01-20 Thread Tomas Vondra





On 1/21/21 1:17 AM, Zhihong Yu wrote:

Hi,
The assignment to resultRelInfo is done when junk_filter_needed is true:

         if (junk_filter_needed)
         {
             resultRelInfo = mtstate->resultRelInfo;

Should the code for determining batch size access mtstate->resultRelInfo 
directly ?




IMO the issue is that code iterates over all plans and moves to the next 
for each one:


resultRelInfo++;

so it ends up pointing past the last element, hence the failures. So 
yeah, either the code needs to move before the loop (per my patch), or 
we need to access mtstate->resultRelInfo directly.


I'm pretty amazed this did not crash during any of the many regression 
runs I did recently.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: POC: postgres_fdw insert batching

2021-01-20 Thread Tomas Vondra


On 1/21/21 12:59 AM, Tom Lane wrote:

Tomas Vondra  writes:

OK, pushed after a little bit of additional polishing (mostly comments).
Thanks everyone!


florican reports this is seriously broken on 32-bit hardware:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=florican&dt=2021-01-20%2023%3A08%3A15

First guess is incorrect memory-allocation computations ...



I know, although it seems more like an access to unitialized memory. 
I've already posted a patch that resolves that for me on 64-bits (per 
valgrind, I suppose it's the same issue).


I'm working on reproducing it on 32-bits, hopefully it won't take long.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: POC: postgres_fdw insert batching

2021-01-20 Thread Tomas Vondra


On 1/21/21 12:52 AM, Tomas Vondra wrote:

Hmm, seems that florican doesn't like this :-(

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=florican&dt=2021-01-20%2023%3A08%3A15 



It's a i386 machine running FreeBSD, so not sure what exactly it's picky 
about. But when I tried running this under valgrind, I get some strange 
failures in the new chunk in ExecInitModifyTable:


   /*
    * Determine if the FDW supports batch insert and determine the batch
    * size (a FDW may support batching, but it may be disabled for the
    * server/table).
    */
   if (!resultRelInfo->ri_usesFdwDirectModify &&
   operation == CMD_INSERT &&
   resultRelInfo->ri_FdwRoutine != NULL &&
   resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
   resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
   resultRelInfo->ri_BatchSize =

resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);
   else
   resultRelInfo->ri_BatchSize = 1;

   Assert(resultRelInfo->ri_BatchSize >= 1);

It seems as if the resultRelInfo is not initialized, or something like 
that. I wouldn't be surprised if the 32-bit machine was pickier and 
failing because of that.


A sample of the valgrind log is attached. It's pretty much just 
repetitions of these three reports.




OK, it's definitely accessing uninitialized memory, because the 
resultRelInfo (on line 2801, i.e. the "if" condition) looks like this:


(gdb) p resultRelInfo
$1 = (ResultRelInfo *) 0xe595988
(gdb) p *resultRelInfo
$2 = {type = 2139062142, ri_RangeTableIndex = 2139062143, 
ri_RelationDesc = 0x7f7f7f7f7f7f7f7f, ri_NumIndices = 2139062143, 
ri_IndexRelationDescs = 0x7f7f7f7f7f7f7f7f, ri_IndexRelationInfo = 
0x7f7f7f7f7f7f7f7f,
  ri_TrigDesc = 0x7f7f7f7f7f7f7f7f, ri_TrigFunctions = 
0x7f7f7f7f7f7f7f7f, ri_TrigWhenExprs = 0x7f7f7f7f7f7f7f7f, 
ri_TrigInstrument = 0x7f7f7f7f7f7f7f7f, ri_ReturningSlot = 
0x7f7f7f7f7f7f7f7f, ri_TrigOldSlot = 0x7f7f7f7f7f7f7f7f,
  ri_TrigNewSlot = 0x7f7f7f7f7f7f7f7f, ri_FdwRoutine = 
0x7f7f7f7f7f7f7f7f, ri_FdwState = 0x7f7f7f7f7f7f7f7f, 
ri_usesFdwDirectModify = 127, ri_NumSlots = 2139062143, ri_BatchSize = 
2139062143, ri_Slots = 0x7f7f7f7f7f7f7f7f,
  ri_PlanSlots = 0x7f7f7f7f7f7f7f7f, ri_WithCheckOptions = 
0x7f7f7f7f7f7f7f7f, ri_WithCheckOptionExprs = 0x7f7f7f7f7f7f7f7f, 
ri_ConstraintExprs = 0x7f7f7f7f7f7f7f7f, ri_GeneratedExprs = 
0x7f7f7f7f7f7f7f7f,
  ri_NumGeneratedNeeded = 2139062143, ri_junkFilter = 
0x7f7f7f7f7f7f7f7f, ri_returningList = 0x7f7f7f7f7f7f7f7f, 
ri_projectReturning = 0x7f7f7f7f7f7f7f7f, ri_onConflictArbiterIndexes = 
0x7f7f7f7f7f7f7f7f,
  ri_onConflict = 0x7f7f7f7f7f7f7f7f, ri_PartitionCheckExpr = 
0x7f7f7f7f7f7f7f7f, ri_PartitionRoot = 0x7f7f7f7f7f7f7f7f, 
ri_RootToPartitionMap = 0x8, ri_PartitionTupleSlot = 0x8, 
ri_ChildToRootMap = 0xe5952b0,

  ri_CopyMultiInsertBuffer = 0xe596740}
(gdb)

I may be wrong, but the most likely explanation seems to be this is due 
to the junk filter initialization, which simply moves past the end of 
the mtstate->resultRelInfo array.


It kinda seems the GetForeignModifyBatchSize call should happen before 
that block. The attached patch fixes this for me (i.e. regression tests 
pass with no valgrind reports.


Or did I get that wrong?

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 9c36860704..2ac4999dc8 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -2672,6 +2672,23 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 		}
 	}
 
+	/*
+	 * Determine if the FDW supports batch insert and determine the batch
+	 * size (a FDW may support batching, but it may be disabled for the
+	 * server/table).
+	 */
+	if (!resultRelInfo->ri_usesFdwDirectModify &&
+		operation == CMD_INSERT &&
+		resultRelInfo->ri_FdwRoutine != NULL &&
+		resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
+		resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
+		resultRelInfo->ri_BatchSize =
+			resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);
+	else
+		resultRelInfo->ri_BatchSize = 1;
+
+	Assert(resultRelInfo->ri_BatchSize >= 1);
+
 	/* select first subplan */
 	mtstate->mt_whichplan = 0;
 	subplan = (Plan *) linitial(node->plans);
@@ -2793,23 +2810,6 @@ ExecInitModifyTable(ModifyTable *node, EState *estate, int eflags)
 		}
 	}
 
-	/*
-	 * Determine if the FDW supports batch insert and determine the batch
-	 * size (a FDW may support batching, but it may be disabled for the
-	 * server/table).
-	 */
-	if (!resultRelInfo->ri_usesFdwDirectModify &&
-		operation == CMD_INSERT &&
-		resultRelInfo->ri_FdwRoutine != NULL &&
-		resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
-		resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
-		resultRelInfo->ri_BatchSize =
-			resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo

Re: POC: postgres_fdw insert batching

2021-01-20 Thread Zhihong Yu

Hi,
The assignment to resultRelInfo is done when junk_filter_needed is true:

if (junk_filter_needed)
{
resultRelInfo = mtstate->resultRelInfo;

Should the code for determining batch size access mtstate->resultRelInfo
directly ?

diff --git a/src/backend/executor/nodeModifyTable.c
b/src/backend/executor/nodeModifyTable.c
index 9c36860704..a6a814454d 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -2798,17 +2798,17 @@ ExecInitModifyTable(ModifyTable *node, EState
*estate, int eflags)
  * size (a FDW may support batching, but it may be disabled for the
  * server/table).
  */
-if (!resultRelInfo->ri_usesFdwDirectModify &&
+if (!mtstate->resultRelInfo->ri_usesFdwDirectModify &&
 operation == CMD_INSERT &&
-resultRelInfo->ri_FdwRoutine != NULL &&
-resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
-resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
-resultRelInfo->ri_BatchSize =
-
 resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);
+mtstate->resultRelInfo->ri_FdwRoutine != NULL &&
+mtstate->resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
+mtstate->resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
+mtstate->resultRelInfo->ri_BatchSize =
+
 
mtstate->resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(mtstate->resultRelInfo);
 else
-resultRelInfo->ri_BatchSize = 1;
+mtstate->resultRelInfo->ri_BatchSize = 1;

-Assert(resultRelInfo->ri_BatchSize >= 1);
+Assert(mtstate->resultRelInfo->ri_BatchSize >= 1);

 /*
  * Lastly, if this is not the primary (canSetTag) ModifyTable node,
add it

Cheers

On Wed, Jan 20, 2021 at 3:52 PM Tomas Vondra 
wrote:

> Hmm, seems that florican doesn't like this :-(
>
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=florican&dt=2021-01-20%2023%3A08%3A15
>
> It's a i386 machine running FreeBSD, so not sure what exactly it's picky
> about. But when I tried running this under valgrind, I get some strange
> failures in the new chunk in ExecInitModifyTable:
>
>/*
> * Determine if the FDW supports batch insert and determine the batch
> * size (a FDW may support batching, but it may be disabled for the
> * server/table).
> */
>if (!resultRelInfo->ri_usesFdwDirectModify &&
>operation == CMD_INSERT &&
>resultRelInfo->ri_FdwRoutine != NULL &&
>resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
>resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
>resultRelInfo->ri_BatchSize =
>
> resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);
>else
>resultRelInfo->ri_BatchSize = 1;
>
>Assert(resultRelInfo->ri_BatchSize >= 1);
>
> It seems as if the resultRelInfo is not initialized, or something like
> that. I wouldn't be surprised if the 32-bit machine was pickier and
> failing because of that.
>
> A sample of the valgrind log is attached. It's pretty much just
> repetitions of these three reports.
>
> regards
>
> --
> Tomas Vondra
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

Re: POC: postgres_fdw insert batching

2021-01-20 Thread Tom Lane

Tomas Vondra  writes:
> OK, pushed after a little bit of additional polishing (mostly comments).
> Thanks everyone!

florican reports this is seriously broken on 32-bit hardware:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=florican&dt=2021-01-20%2023%3A08%3A15

First guess is incorrect memory-allocation computations ...

regards, tom lane

Re: POC: postgres_fdw insert batching

2021-01-20 Thread Tomas Vondra


Hmm, seems that florican doesn't like this :-(

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=florican&dt=2021-01-20%2023%3A08%3A15

It's a i386 machine running FreeBSD, so not sure what exactly it's picky 
about. But when I tried running this under valgrind, I get some strange 
failures in the new chunk in ExecInitModifyTable:


  /*
   * Determine if the FDW supports batch insert and determine the batch
   * size (a FDW may support batching, but it may be disabled for the
   * server/table).
   */
  if (!resultRelInfo->ri_usesFdwDirectModify &&
  operation == CMD_INSERT &&
  resultRelInfo->ri_FdwRoutine != NULL &&
  resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
  resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert)
  resultRelInfo->ri_BatchSize =

resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);
  else
  resultRelInfo->ri_BatchSize = 1;

  Assert(resultRelInfo->ri_BatchSize >= 1);

It seems as if the resultRelInfo is not initialized, or something like 
that. I wouldn't be surprised if the 32-bit machine was pickier and 
failing because of that.


A sample of the valgrind log is attached. It's pretty much just 
repetitions of these three reports.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
==186005== Invalid read of size 1
==186005==at 0x759C12: ExecInitModifyTable (nodeModifyTable.c:2801)
==186005==by 0x720B35: ExecInitNode (execProcnode.c:174)
==186005==by 0x716F99: InitPlan (execMain.c:939)
==186005==by 0x715F47: standard_ExecutorStart (execMain.c:266)
==186005==by 0x715CD3: ExecutorStart (execMain.c:148)
==186005==by 0x9495B1: ProcessQuery (pquery.c:155)
==186005==by 0x94AF24: PortalRunMulti (pquery.c:1267)
==186005==by 0x94A4C8: PortalRun (pquery.c:779)
==186005==by 0x94438E: exec_simple_query (postgres.c:1240)
==186005==by 0x9485F5: PostgresMain (postgres.c:4394)
==186005==by 0x88D1D4: BackendRun (postmaster.c:4484)
==186005==by 0x88CB53: BackendStartup (postmaster.c:4206)
==186005==by 0x889205: ServerLoop (postmaster.c:1730)
==186005==by 0x888AD0: PostmasterMain (postmaster.c:1402)
==186005==by 0x78D466: main (main.c:209)
==186005==  Address 0xe594f78 is 1,864 bytes inside a block of size 8,192 alloc'd
==186005==at 0x483A809: malloc (vg_replace_malloc.c:307)
==186005==by 0xAFEFE6: AllocSetContextCreateInternal (aset.c:468)
==186005==by 0x7298F9: CreateExprContextInternal (execUtils.c:252)
==186005==by 0x7299DD: CreateExprContext (execUtils.c:302)
==186005==by 0x729C5E: ExecAssignExprContext (execUtils.c:481)
==186005==by 0x75D24D: ExecInitSeqScan (nodeSeqscan.c:147)
==186005==by 0x720BEF: ExecInitNode (execProcnode.c:207)
==186005==by 0x716F99: InitPlan (execMain.c:939)
==186005==by 0x715F47: standard_ExecutorStart (execMain.c:266)
==186005==by 0x715CD3: ExecutorStart (execMain.c:148)
==186005==by 0x949E64: PortalStart (pquery.c:505)
==186005==by 0x9442A2: exec_simple_query (postgres.c:1201)
==186005==by 0x9485F5: PostgresMain (postgres.c:4394)
==186005==by 0x88D1D4: BackendRun (postmaster.c:4484)
==186005==by 0x88CB53: BackendStartup (postmaster.c:4206)
==186005==by 0x889205: ServerLoop (postmaster.c:1730)
==186005==by 0x888AD0: PostmasterMain (postmaster.c:1402)
==186005==by 0x78D466: main (main.c:209)
==186005== 
{
   
   Memcheck:Addr1
   fun:ExecInitModifyTable
   fun:ExecInitNode
   fun:InitPlan
   fun:standard_ExecutorStart
   fun:ExecutorStart
   fun:ProcessQuery
   fun:PortalRunMulti
   fun:PortalRun
   fun:exec_simple_query
   fun:PostgresMain
   fun:BackendRun
   fun:BackendStartup
   fun:ServerLoop
   fun:PostmasterMain
   fun:main
}
==186005== Invalid write of size 4
==186005==at 0x759C74: ExecInitModifyTable (nodeModifyTable.c:2809)
==186005==by 0x720B35: ExecInitNode (execProcnode.c:174)
==186005==by 0x716F99: InitPlan (execMain.c:939)
==186005==by 0x715F47: standard_ExecutorStart (execMain.c:266)
==186005==by 0x715CD3: ExecutorStart (execMain.c:148)
==186005==by 0x9495B1: ProcessQuery (pquery.c:155)
==186005==by 0x94AF24: PortalRunMulti (pquery.c:1267)
==186005==by 0x94A4C8: PortalRun (pquery.c:779)
==186005==by 0x94438E: exec_simple_query (postgres.c:1240)
==186005==by 0x9485F5: PostgresMain (postgres.c:4394)
==186005==by 0x88D1D4: BackendRun (postmaster.c:4484)
==186005==by 0x88CB53: BackendStartup (postmaster.c:4206)
==186005==by 0x889205: ServerLoop (postmaster.c:1730)
==186005==by 0x888AD0: PostmasterMain (postmaster.c:1402)
==186005==by 0x78D466: main (main.c:209)
==186005==  Address 0xe594f80 is 1,872 bytes inside a block of size 8,192 alloc'd
==186005==at 0x483A809: malloc (vg_replace_malloc.c:307)
==186005==by 0xAFEFE6: AllocSetContextCreateInternal (aset.c:468)
==186005==by 0x7298F9: CreateExprContextInternal (execUtils.c:252)
==186005==by

Re: POC: postgres_fdw insert batching

2021-01-20 Thread Tomas Vondra


OK, pushed after a little bit of additional polishing (mostly comments).

Thanks everyone!

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: POC: postgres_fdw insert batching

2021-01-19 Thread Amit Langote

On Wed, Jan 20, 2021 at 1:01 AM Tomas Vondra
 wrote:
> On 1/19/21 7:23 AM, Amit Langote wrote:
> > On Tue, Jan 19, 2021 at 2:06 PM tsunakawa.ta...@fujitsu.com
> >> Actually, I tried to do it (adding the GetModifyBatchSize() call after 
> >> BeginForeignModify()), but it failed.  Because 
> >> postgresfdwGetModifyBatchSize() wants to know if RETURNING is specified, 
> >> and ResultRelInfo->projectReturning is created after the above part.  
> >> Considering the context where GetModifyBatchSize() implementations may 
> >> want to know the environment, I placed the call as late as possible in the 
> >> initialization phase.  As for the future(?) multi-target DML statements, I 
> >> think we can change this together with other many(?) parts that assume a 
> >> single target table.
> >
> > Okay, sometime later then.
> >
> > I wasn't sure if bringing it up here would be appropriate, but there's
> > a patch by me to refactor ModfiyTable result relation allocation that
> > will have to remember to move this code along to an appropriate place
> > [1].  Thanks for the tip about the dependency on how RETURNING is
> > handled.  I will remember it when rebasing my patch over this.
> >
>
> Thanks. The last version (v12) should be addressing all the comments and
> seems fine to me, so barring objections I'll get that pushed shortly.

+1

> One thing that seems a bit annoying is that with the partitioned table
> the explain (verbose) looks like this:
>
>   QUERY PLAN
> -
>   Insert on public.batch_table
> ->  Function Scan on pg_catalog.generate_series i
>   Output: i.i
>   Function Call: generate_series(1, 66)
> (4 rows)
>
> That is, there's no information about the batch size :-( But AFAICS
> that's due to how explain shows (or rather does not) partitions in this
> type of plan.

Yeah.  Partition result relations are always lazily allocated for
INSERT, so EXPLAIN (without ANALYZE) has no idea what to show for
them, nor does it know which partitions will be used in the first
place.  With ANALYZE however, you could get them from
es_tuple_routing_result_relations and maybe list them if you want, but
that sounds like a project on its own.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Re: POC: postgres_fdw insert batching

2021-01-19 Thread Tomas Vondra





On 1/19/21 7:23 AM, Amit Langote wrote:

On Tue, Jan 19, 2021 at 2:06 PM tsunakawa.ta...@fujitsu.com
 wrote:

From: Amit Langote 

I apologize in advance for being maybe overly pedantic, but I noticed
that, in ExecInitModifyTable(), you decided to place the call outside
the loop that goes over resultRelations (shown below), although my
intent was to ask to place it next to the BeginForeignModify() in that
loop.


Actually, I tried to do it (adding the GetModifyBatchSize() call after 
BeginForeignModify()), but it failed.  Because postgresfdwGetModifyBatchSize() 
wants to know if RETURNING is specified, and ResultRelInfo->projectReturning is 
created after the above part.  Considering the context where GetModifyBatchSize() 
implementations may want to know the environment, I placed the call as late as 
possible in the initialization phase.  As for the future(?) multi-target DML 
statements, I think we can change this together with other many(?) parts that 
assume a single target table.


Okay, sometime later then.

I wasn't sure if bringing it up here would be appropriate, but there's
a patch by me to refactor ModfiyTable result relation allocation that
will have to remember to move this code along to an appropriate place
[1].  Thanks for the tip about the dependency on how RETURNING is
handled.  I will remember it when rebasing my patch over this.



Thanks. The last version (v12) should be addressing all the comments and 
seems fine to me, so barring objections I'll get that pushed shortly.


One thing that seems a bit annoying is that with the partitioned table 
the explain (verbose) looks like this:


 QUERY PLAN
-
 Insert on public.batch_table
   ->  Function Scan on pg_catalog.generate_series i
 Output: i.i
 Function Call: generate_series(1, 66)
(4 rows)

That is, there's no information about the batch size :-( But AFAICS 
that's due to how explain shows (or rather does not) partitions in this 
type of plan.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: POC: postgres_fdw insert batching

2021-01-18 Thread Amit Langote

On Tue, Jan 19, 2021 at 2:06 PM tsunakawa.ta...@fujitsu.com
 wrote:
> From: Amit Langote 
> > I apologize in advance for being maybe overly pedantic, but I noticed
> > that, in ExecInitModifyTable(), you decided to place the call outside
> > the loop that goes over resultRelations (shown below), although my
> > intent was to ask to place it next to the BeginForeignModify() in that
> > loop.
>
> Actually, I tried to do it (adding the GetModifyBatchSize() call after 
> BeginForeignModify()), but it failed.  Because 
> postgresfdwGetModifyBatchSize() wants to know if RETURNING is specified, and 
> ResultRelInfo->projectReturning is created after the above part.  Considering 
> the context where GetModifyBatchSize() implementations may want to know the 
> environment, I placed the call as late as possible in the initialization 
> phase.  As for the future(?) multi-target DML statements, I think we can 
> change this together with other many(?) parts that assume a single target 
> table.

Okay, sometime later then.

I wasn't sure if bringing it up here would be appropriate, but there's
a patch by me to refactor ModfiyTable result relation allocation that
will have to remember to move this code along to an appropriate place
[1].  Thanks for the tip about the dependency on how RETURNING is
handled.  I will remember it when rebasing my patch over this.

--
Amit Langote
EDB: http://www.enterprisedb.com

[1] https://commitfest.postgresql.org/31/2621/

RE: POC: postgres_fdw insert batching

2021-01-18 Thread tsunakawa.ta...@fujitsu.com

From: Amit Langote 
> I apologize in advance for being maybe overly pedantic, but I noticed
> that, in ExecInitModifyTable(), you decided to place the call outside
> the loop that goes over resultRelations (shown below), although my
> intent was to ask to place it next to the BeginForeignModify() in that
> loop.

Actually, I tried to do it (adding the GetModifyBatchSize() call after 
BeginForeignModify()), but it failed.  Because postgresfdwGetModifyBatchSize() 
wants to know if RETURNING is specified, and ResultRelInfo->projectReturning is 
created after the above part.  Considering the context where 
GetModifyBatchSize() implementations may want to know the environment, I placed 
the call as late as possible in the initialization phase.  As for the future(?) 
multi-target DML statements, I think we can change this together with other 
many(?) parts that assume a single target table.


Regards
Takayuki Tsunakawa

Re: POC: postgres_fdw insert batching

2021-01-18 Thread Amit Langote

Tsunakawa-san,

On Tue, Jan 19, 2021 at 12:50 PM tsunakawa.ta...@fujitsu.com
 wrote:
> From: Tomas Vondra 
> > OK. Can you prepare a final patch, squashing all the commits into a
> > single one, and perhaps use the function in create_foreign_modify?
>
> Attached, including the message fix pointed by Zaihong-san.

Thanks for adopting my suggestions regarding GetForeignModifyBatchSize().

I apologize in advance for being maybe overly pedantic, but I noticed
that, in ExecInitModifyTable(), you decided to place the call outside
the loop that goes over resultRelations (shown below), although my
intent was to ask to place it next to the BeginForeignModify() in that
loop.

resultRelInfo = mtstate->resultRelInfo;
i = 0;
forboth(l, node->resultRelations, l1, node->plans)
{
...
/* Also let FDWs init themselves for foreign-table result rels */
if (!resultRelInfo->ri_usesFdwDirectModify &&
resultRelInfo->ri_FdwRoutine != NULL &&
resultRelInfo->ri_FdwRoutine->BeginForeignModify != NULL)
{
List   *fdw_private = (List *) list_nth(node->fdwPrivLists, i);

resultRelInfo->ri_FdwRoutine->BeginForeignModify(mtstate,
 resultRelInfo,
 fdw_private,
 i,
 eflags);
}

Maybe it's fine today because we only care about inserts and there's
always only one entry in the resultRelations list in that case, but
that may not remain the case in the future.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

RE: POC: postgres_fdw insert batching

2021-01-18 Thread tsunakawa.ta...@fujitsu.com

From: Tomas Vondra 
> OK. Can you prepare a final patch, squashing all the commits into a
> single one, and perhaps use the function in create_foreign_modify?

Attached, including the message fix pointed by Zaihong-san.


Regards
Takayuki Tsunakawa



v12-0001-Add-bulk-insert-for-foreign-tables.patch
Description: v12-0001-Add-bulk-insert-for-foreign-tables.patch

Re: POC: postgres_fdw insert batching

2021-01-18 Thread Tomas Vondra


On 1/19/21 2:28 AM, tsunakawa.ta...@fujitsu.com wrote:

From: Tomas Vondra 

I took a look at this - there's a bit of bitrot due to
708d165ddb92c, so attached is a rebased patch (0001) fixing that.

0002 adds a couple comments and minor tweaks

0003 addresses a couple shortcomings related to explain - we
haven't been showing the batch size for EXPLAIN (VERBOSE), because
there'd be no FdwState, so this tries to fix that. Furthermore,
there were no tests for EXPLAIN output with batch size, so I added
a couple.


Thank you, good additions.  They all look good. Only one point: I
think the code for retrieving batch_size in create_foreign_modify()
can be replaced with a call to the new function in 0003.



OK. Can you prepare a final patch, squashing all the commits into a 
single one, and perhaps use the function in create_foreign_modify?



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

RE: POC: postgres_fdw insert batching

2021-01-18 Thread tsunakawa.ta...@fujitsu.com

Tomas-san, Zhihong-san,

From: Zhihong Yu  
> +   if (batch_size <= 0)
> +   ereport(ERROR,
> +   (errcode(ERRCODE_SYNTAX_ERROR),
> +errmsg("%s requires a non-negative integer value",
> 
> It seems the message doesn't match the check w.r.t. the batch size of 0.

Ah, "non-negative" should be "positive".  The message for the existing 
fetch_size should be fixed too.  Tomas-san, could you include this as well?  
I'm sorry to trouble you.


> +   int numInserted = numSlots;
> 
> Since numInserted is filled by ExecForeignBatchInsert(), the initialization 
> can be done with 0.

No, the code is correct, since the batch function requires the number of rows 
to insert as input.


Regards
Takayuki Tsunakawa

RE: POC: postgres_fdw insert batching

2021-01-18 Thread tsunakawa.ta...@fujitsu.com

From: Tomas Vondra 
> I took a look at this - there's a bit of bitrot due to 708d165ddb92c, so 
> attached is
> a rebased patch (0001) fixing that.
> 
> 0002 adds a couple comments and minor tweaks
> 
> 0003 addresses a couple shortcomings related to explain - we haven't been
> showing the batch size for EXPLAIN (VERBOSE), because there'd be no
> FdwState, so this tries to fix that. Furthermore, there were no tests for 
> EXPLAIN
> output with batch size, so I added a couple.

Thank you, good additions.  They all look good.
Only one point: I think the code for retrieving batch_size in 
create_foreign_modify() can be replaced with a call to the new function in 0003.

God bless us.


Regards
Takayuki Tsunakawa

Re: POC: postgres_fdw insert batching

2021-01-18 Thread Tomas Vondra




On 1/18/21 7:51 AM, tsunakawa.ta...@fujitsu.com wrote:

Tomas-san,

From: Amit Langote 
Good thing you reminded me that this is about inserts, and in that 
case no, ExecInitModifyTable() doesn't know all leaf partitions,

it only sees the root table whose batch_size doesn't really matter.
So it's really ExecInitRoutingInfo() that I would recommend to set 
ri_BatchSize; right after this block:


/* * If the partition is a foreign table, let the FDW init itself
for * routing tuples to the partition. */ if
(partRelInfo->ri_FdwRoutine != NULL && 
partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL) 
partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate,

partRelInfo);

Note that ExecInitRoutingInfo() is called only once for a
partition when it is initialized after being inserted into for the
first time.

For a non-partitioned targets, I'd still say set ri_BatchSize in 
ExecInitModifyTable().


Attached is the patch that added call to GetModifyBatchSize() to the
above two places.  The regression test passes.

(FWIW, frankly, I prefer the previous version because the code is a
bit smaller...  Maybe we should refactor the code someday to reduce
similar processings in both the partitioned case and non-partitioned
case.)



Less code would be nice, but it's not always the right thing to do, 
unfortunately :-(


I took a look at this - there's a bit of bitrot due to 708d165ddb92c, so 
attached is a rebased patch (0001) fixing that.


0002 adds a couple comments and minor tweaks

0003 addresses a couple shortcomings related to explain - we haven't 
been showing the batch size for EXPLAIN (VERBOSE), because there'd be no 
FdwState, so this tries to fix that. Furthermore, there were no tests 
for EXPLAIN output with batch size, so I added a couple.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
>From 9425a8501543d4caf55a302a877745b0f83b6046 Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Mon, 18 Jan 2021 15:18:14 +0100
Subject: [PATCH 1/3] Add bulk insert for foreign tables

---
 contrib/postgres_fdw/deparse.c|  43 ++-
 .../postgres_fdw/expected/postgres_fdw.out| 116 ++-
 contrib/postgres_fdw/option.c |  14 +
 contrib/postgres_fdw/postgres_fdw.c   | 302 ++
 contrib/postgres_fdw/postgres_fdw.h   |   5 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql |  91 ++
 doc/src/sgml/fdwhandler.sgml  |  89 +-
 doc/src/sgml/postgres-fdw.sgml|  13 +
 src/backend/executor/execPartition.c  |  12 +
 src/backend/executor/nodeModifyTable.c| 157 +
 src/backend/nodes/list.c  |  15 +
 src/include/foreign/fdwapi.h  |  10 +
 src/include/nodes/execnodes.h |   6 +
 src/include/nodes/pg_list.h   |  15 +
 14 files changed, 822 insertions(+), 66 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 3cf7b4eb1e..2d38ab25cb 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -1711,7 +1711,7 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
  Index rtindex, Relation rel,
  List *targetAttrs, bool doNothing,
  List *withCheckOptionList, List *returningList,
- List **retrieved_attrs)
+ List **retrieved_attrs, int *values_end_len)
 {
 	AttrNumber	pindex;
 	bool		first;
@@ -1754,6 +1754,7 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 	}
 	else
 		appendStringInfoString(buf, " DEFAULT VALUES");
+	*values_end_len = buf->len;
 
 	if (doNothing)
 		appendStringInfoString(buf, " ON CONFLICT DO NOTHING");
@@ -1763,6 +1764,46 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 		 withCheckOptionList, returningList, retrieved_attrs);
 }
 
+/*
+ * rebuild remote INSERT statement
+ *
+ */
+void
+rebuildInsertSql(StringInfo buf, char *orig_query,
+ int values_end_len, int num_cols,
+ int num_rows)
+{
+	int			i, j;
+	int			pindex;
+	bool		first;
+
+	/* Copy up to the end of the first record from the original query */
+	appendBinaryStringInfo(buf, orig_query, values_end_len);
+
+	/* Add records to VALUES clause */
+	pindex = num_cols + 1;
+	for (i = 0; i < num_rows; i++)
+	{
+		appendStringInfoString(buf, ", (");
+
+		first = true;
+		for (j = 0; j < num_cols; j++)
+		{
+			if (!first)
+appendStringInfoString(buf, ", ");
+			first = false;
+
+			appendStringInfo(buf, "$%d", pindex);
+			pindex++;
+		}
+
+		appendStringInfoChar(buf, ')');
+	}
+
+	/* Copy stuff after VALUES clause from the original query */
+	appendStringInfoString(buf, orig_query + values_end_len);
+}
+
 /*
  * deparse remote UPDATE statement
  *
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 1cad311436..8c0fdb5a9a 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8923,7 +8

Re: POC: postgres_fdw insert batching

2021-01-18 Thread Zhihong Yu

Hi, Takayuki-san:

+   if (batch_size <= 0)
+   ereport(ERROR,
+   (errcode(ERRCODE_SYNTAX_ERROR),
+errmsg("%s requires a non-negative integer value",

It seems the message doesn't match the check w.r.t. the batch size of 0.

+   int numInserted = numSlots;

Since numInserted is filled by ExecForeignBatchInsert(), the initialization
can be done with 0.

Cheers

On Sun, Jan 17, 2021 at 10:52 PM tsunakawa.ta...@fujitsu.com <
tsunakawa.ta...@fujitsu.com> wrote:

> Tomas-san,
>
> From: Amit Langote 
> > Good thing you reminded me that this is about inserts, and in that
> > case no, ExecInitModifyTable() doesn't know all leaf partitions, it
> > only sees the root table whose batch_size doesn't really matter.  So
> > it's really ExecInitRoutingInfo() that I would recommend to set
> > ri_BatchSize; right after this block:
> >
> > /*
> >  * If the partition is a foreign table, let the FDW init itself for
> >  * routing tuples to the partition.
> >  */
> > if (partRelInfo->ri_FdwRoutine != NULL &&
> > partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
> > partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
> >
> > Note that ExecInitRoutingInfo() is called only once for a partition
> > when it is initialized after being inserted into for the first time.
> >
> > For a non-partitioned targets, I'd still say set ri_BatchSize in
> > ExecInitModifyTable().
>
> Attached is the patch that added call to GetModifyBatchSize() to the above
> two places.  The regression test passes.
>
> (FWIW, frankly, I prefer the previous version because the code is a bit
> smaller...  Maybe we should refactor the code someday to reduce similar
> processings in both the partitioned case and non-partitioned case.)
>
>
> Regards
> Takayuki Tsunakawa
>
>

RE: POC: postgres_fdw insert batching

2021-01-17 Thread tsunakawa.ta...@fujitsu.com

Tomas-san,

From: Amit Langote 
> Good thing you reminded me that this is about inserts, and in that
> case no, ExecInitModifyTable() doesn't know all leaf partitions, it
> only sees the root table whose batch_size doesn't really matter.  So
> it's really ExecInitRoutingInfo() that I would recommend to set
> ri_BatchSize; right after this block:
> 
> /*
>  * If the partition is a foreign table, let the FDW init itself for
>  * routing tuples to the partition.
>  */
> if (partRelInfo->ri_FdwRoutine != NULL &&
> partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
> partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);
> 
> Note that ExecInitRoutingInfo() is called only once for a partition
> when it is initialized after being inserted into for the first time.
> 
> For a non-partitioned targets, I'd still say set ri_BatchSize in
> ExecInitModifyTable().

Attached is the patch that added call to GetModifyBatchSize() to the above two 
places.  The regression test passes.

(FWIW, frankly, I prefer the previous version because the code is a bit 
smaller...  Maybe we should refactor the code someday to reduce similar 
processings in both the partitioned case and non-partitioned case.)


Regards
Takayuki Tsunakawa



v10-0001-Add-bulk-insert-for-foreign-tables.patch
Description: v10-0001-Add-bulk-insert-for-foreign-tables.patch

Re: POC: postgres_fdw insert batching

2021-01-15 Thread Amit Langote

On Sat, Jan 16, 2021 at 12:00 AM tsunakawa.ta...@fujitsu.com
 wrote:
> From: Amit Langote 
> > Okay, so maybe not moving the whole logic into the FDW's
> > BeginForeignModify(), but at least if we move this...
> >
> > @@ -441,6 +449,72 @@ ExecInsert(ModifyTableState *mtstate,
> > +   /*
> > +* Determine if the FDW supports batch insert and determine the
> > batch
> > +* size (a FDW may support batching, but it may be disabled for the
> > +* server/table). Do this only once, at the beginning - we don't 
> > want
> > +* the batch size to change during execution.
> > +*/
> > +   if (resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
> > +   resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert &&
> > +   resultRelInfo->ri_BatchSize == 0)
> > +   resultRelInfo->ri_BatchSize =
> > +
> > resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);
> >
> > ...into ExecInitModifyTable(), ExecInsert() only needs the following block:
>
> Does ExecInitModifyTable() know all leaf partitions where the tuples produced 
> by VALUES or SELECT go?  ExecInsert() doesn't find the target leaf partition 
> for the first time through the call to ExecPrepareTupleRouting()?  Leaf 
> partitions can have different batch_size settings.

Good thing you reminded me that this is about inserts, and in that
case no, ExecInitModifyTable() doesn't know all leaf partitions, it
only sees the root table whose batch_size doesn't really matter.  So
it's really ExecInitRoutingInfo() that I would recommend to set
ri_BatchSize; right after this block:

/*
 * If the partition is a foreign table, let the FDW init itself for
 * routing tuples to the partition.
 */
if (partRelInfo->ri_FdwRoutine != NULL &&
partRelInfo->ri_FdwRoutine->BeginForeignInsert != NULL)
partRelInfo->ri_FdwRoutine->BeginForeignInsert(mtstate, partRelInfo);

Note that ExecInitRoutingInfo() is called only once for a partition
when it is initialized after being inserted into for the first time.

For a non-partitioned targets, I'd still say set ri_BatchSize in
ExecInitModifyTable().

-- 
Amit Langote
EDB: http://www.enterprisedb.com

RE: POC: postgres_fdw insert batching

2021-01-15 Thread tsunakawa.ta...@fujitsu.com

From: Amit Langote 
> Okay, so maybe not moving the whole logic into the FDW's
> BeginForeignModify(), but at least if we move this...
> 
> @@ -441,6 +449,72 @@ ExecInsert(ModifyTableState *mtstate,
> +   /*
> +* Determine if the FDW supports batch insert and determine the
> batch
> +* size (a FDW may support batching, but it may be disabled for the
> +* server/table). Do this only once, at the beginning - we don't want
> +* the batch size to change during execution.
> +*/
> +   if (resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
> +   resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert &&
> +   resultRelInfo->ri_BatchSize == 0)
> +   resultRelInfo->ri_BatchSize =
> +
> resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);
> 
> ...into ExecInitModifyTable(), ExecInsert() only needs the following block:

Does ExecInitModifyTable() know all leaf partitions where the tuples produced 
by VALUES or SELECT go?  ExecInsert() doesn't find the target leaf partition 
for the first time through the call to ExecPrepareTupleRouting()?  Leaf 
partitions can have different batch_size settings.


Regards
Takayuki Tsunakawa

Re: POC: postgres_fdw insert batching

2021-01-15 Thread Amit Langote

On Fri, Jan 15, 2021 at 12:05 AM Tomas Vondra
 wrote:
> Attached is v9 with all of those tweaks,

Thanks.

> except for moving the BatchSize
> call to BeginForeignModify - I tried that, but it did not seem like an
> improvement, because we'd still need the checks for API callbacks in
> ExecInsert for example. So I decided not to do that.

Okay, so maybe not moving the whole logic into the FDW's
BeginForeignModify(), but at least if we move this...

@@ -441,6 +449,72 @@ ExecInsert(ModifyTableState *mtstate,
+   /*
+* Determine if the FDW supports batch insert and determine the batch
+* size (a FDW may support batching, but it may be disabled for the
+* server/table). Do this only once, at the beginning - we don't want
+* the batch size to change during execution.
+*/
+   if (resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize &&
+   resultRelInfo->ri_FdwRoutine->ExecForeignBatchInsert &&
+   resultRelInfo->ri_BatchSize == 0)
+   resultRelInfo->ri_BatchSize =
+
resultRelInfo->ri_FdwRoutine->GetForeignModifyBatchSize(resultRelInfo);

...into ExecInitModifyTable(), ExecInsert() only needs the following block:

   /*
+* If the FDW supports batching, and batching is requested, accumulate
+* rows and insert them in batches. Otherwise use the per-row inserts.
+*/
+   if (resultRelInfo->ri_BatchSize > 1)
+   {
+ ...

AFAICS, I don't see anything that will cause ri_BatchSize to become 0
once set so don't see the point of checking whether it needs to be set
again on every ExecInsert() call.  Also, maybe not that important, but
shaving off 3 comparisons for every tuple would add up nicely IMHO
especially given that we're targeting bulk loads.

--
Amit Langote
EDB: http://www.enterprisedb.com

RE: POC: postgres_fdw insert batching

2021-01-14 Thread tsunakawa.ta...@fujitsu.com

From: Tomas Vondra 
> Attached is v9 with all of those tweaks, except for moving the BatchSize call 
> to
> BeginForeignModify - I tried that, but it did not seem like an improvement,
> because we'd still need the checks for API callbacks in ExecInsert for 
> example.
> So I decided not to do that.

Thanks, Tomas-san.  The patch looks good again.

Amit-san, thank you for teaching us about es_tuple_routing_result_relations and 
es_opened_result_relations.


Regards
Takayuki Tsunakawa

Re: POC: postgres_fdw insert batching

2021-01-14 Thread Tomas Vondra



On 1/14/21 2:57 PM, Amit Langote wrote:
> On Thu, Jan 14, 2021 at 21:57 Tomas Vondra
> mailto:tomas.von...@enterprisedb.com>>
> wrote:
> 
> On 1/14/21 9:58 AM, Amit Langote wrote:
> > Hi,
> >
> > On Thu, Jan 14, 2021 at 2:41 AM Tomas Vondra
> >  > wrote:
> >> On 1/13/21 3:43 PM, Tomas Vondra wrote:
> >>> Thanks for the report. Yeah, I think there's a missing check in
> >>> ExecInsert. Adding
> >>>
> >>>   (!resultRelInfo->ri_TrigDesc->trig_insert_after_row)
> >>>
> >>> solves this. But now I'm wondering if this is the wrong place to
> make
> >>> this decision. I mean, why should we make the decision here,
> when the
> >>> decision whether to have a RETURNING clause is made in
> postgres_fdw in
> >>> deparseReturningList? We don't really know what the other FDWs
> will do,
> >>> for example.
> >>>
> >>> So I think we should just move all of this into
> GetModifyBatchSize. We
> >>> can start with ri_BatchSize = 0. And then do
> >>>
> >>>   if (resultRelInfo->ri_BatchSize == 0)
> >>>     resultRelInfo->ri_BatchSize =
> >>>     
>  resultRelInfo->ri_FdwRoutine->GetModifyBatchSize(resultRelInfo);
> >>>
> >>>   if (resultRelInfo->ri_BatchSize > 1)
> >>>   {
> >>>     ... do batching ...
> >>>   }
> >>>
> >>> The GetModifyBatchSize would always return value > 0, so either
> 1 (no
> >>> batching) or >1 (batching).
> >>>
> >>
> >> FWIW the attached v8 patch does this - most of the conditions are
> moved
> >> to the GetModifyBatchSize() callback.
> >
> > Thanks.  A few comments:
> >
> > * I agree with leaving it up to an FDW to look at the properties of
> > the table and of the operation being performed to decide whether or
> > not to use batching, although maybe BeginForeignModify() is a better
> > place for putting that logic instead of GetModifyBatchSize()?  So, in
> > create_foreign_modify(), instead of PgFdwModifyState.batch_size simply
> > being set to match the table's or the server's value for the
> > batch_size option, make it also consider the things that prevent
> > batching and set the execution state's batch_size based on that.
> > GetModifyBatchSize() simply returns that value.
> >
> > * Regarding the timing of calling GetModifyBatchSize() to set
> > ri_BatchSize, I wonder if it wouldn't be better to call it just once,
> > say from ExecInitModifyTable(), right after BeginForeignModify()
> > returns?  I don't quite understand why it is being called from
> > ExecInsert().  Can the batch size change once the execution starts?
> >
> 
> But it should be called just once. The idea is that initially we have
> batch_size=0, and the fist call returns value that is >= 1. So we never
> call it again. But maybe it could be called from BeginForeignModify, in
> which case we'd not need this logic with first setting it to 0 etc.
> 
> 
> Right, although I was thinking that maybe ri_BatchSize itself is not to
> be written to by the FDW.  Not to say that’s doing anything wrong though.
> 
> > * Lastly, how about calling it GetForeignModifyBatchSize() to be
> > consistent with other nearby callbacks?
> >
> 
> Yeah, good point.
> 
> >> I've removed the check for the
> >> BatchInsert callback, though - the FDW knows whether it supports
> that,
> >> and it seems a bit pointless at the moment as there are no other
> batch
> >> callbacks. Maybe we should add an Assert somewhere, though?
> >
> > Hmm, not checking whether BatchInsert() exists may not be good idea,
> > because if an FDW's GetModifyBatchSize() returns a value > 1 but
> > there's no BatchInsert() function to call, ExecBatchInsert() would
> > trip.  I don't see the newly added documentation telling FDW authors
> > to either define both or none.
> >
> 
> Hmm. The BatchInsert check seemed somewhat unnecessary to me, but OTOH
> it can't hurt, I guess. I'll ad it back.
> 
> > Regarding how this plays with partitions, I don't think we need
> > ExecGetTouchedPartitions(), because you can get the routed-to
> > partitions using es_tuple_routing_result_relations.  Also, perhaps
> 
> I'm not very familiar with es_tuple_routing_result_relations, but that
> doesn't seem to work. I've replaced the flushing code at the end of
> ExecModifyTable with a loop over es_tuple_routing_result_relations, but
> then some of the rows are missing (i.e. not flushed).
> 
> 
> I should’ve mentioned es_opened_result_relations too which contain
> non-routing result relations.  So I really meant if (proute) then use
> es_tuple_routing_result_relations, else es_opened_result_relations. 
> This should work as long as batching is only used for inserts.
> 

Ah, right. That did the trick.

Attached

Re: POC: postgres_fdw insert batching

2021-01-14 Thread Amit Langote

On Thu, Jan 14, 2021 at 21:57 Tomas Vondra 
wrote:

> On 1/14/21 9:58 AM, Amit Langote wrote:
> > Hi,
> >
> > On Thu, Jan 14, 2021 at 2:41 AM Tomas Vondra
> >  wrote:
> >> On 1/13/21 3:43 PM, Tomas Vondra wrote:
> >>> Thanks for the report. Yeah, I think there's a missing check in
> >>> ExecInsert. Adding
> >>>
> >>>   (!resultRelInfo->ri_TrigDesc->trig_insert_after_row)
> >>>
> >>> solves this. But now I'm wondering if this is the wrong place to make
> >>> this decision. I mean, why should we make the decision here, when the
> >>> decision whether to have a RETURNING clause is made in postgres_fdw in
> >>> deparseReturningList? We don't really know what the other FDWs will do,
> >>> for example.
> >>>
> >>> So I think we should just move all of this into GetModifyBatchSize. We
> >>> can start with ri_BatchSize = 0. And then do
> >>>
> >>>   if (resultRelInfo->ri_BatchSize == 0)
> >>> resultRelInfo->ri_BatchSize =
> >>>   resultRelInfo->ri_FdwRoutine->GetModifyBatchSize(resultRelInfo);
> >>>
> >>>   if (resultRelInfo->ri_BatchSize > 1)
> >>>   {
> >>> ... do batching ...
> >>>   }
> >>>
> >>> The GetModifyBatchSize would always return value > 0, so either 1 (no
> >>> batching) or >1 (batching).
> >>>
> >>
> >> FWIW the attached v8 patch does this - most of the conditions are moved
> >> to the GetModifyBatchSize() callback.
> >
> > Thanks.  A few comments:
> >
> > * I agree with leaving it up to an FDW to look at the properties of
> > the table and of the operation being performed to decide whether or
> > not to use batching, although maybe BeginForeignModify() is a better
> > place for putting that logic instead of GetModifyBatchSize()?  So, in
> > create_foreign_modify(), instead of PgFdwModifyState.batch_size simply
> > being set to match the table's or the server's value for the
> > batch_size option, make it also consider the things that prevent
> > batching and set the execution state's batch_size based on that.
> > GetModifyBatchSize() simply returns that value.
> >
> > * Regarding the timing of calling GetModifyBatchSize() to set
> > ri_BatchSize, I wonder if it wouldn't be better to call it just once,
> > say from ExecInitModifyTable(), right after BeginForeignModify()
> > returns?  I don't quite understand why it is being called from
> > ExecInsert().  Can the batch size change once the execution starts?
> >
>
> But it should be called just once. The idea is that initially we have
> batch_size=0, and the fist call returns value that is >= 1. So we never
> call it again. But maybe it could be called from BeginForeignModify, in
> which case we'd not need this logic with first setting it to 0 etc.


Right, although I was thinking that maybe ri_BatchSize itself is not to be
written to by the FDW.  Not to say that’s doing anything wrong though.

> * Lastly, how about calling it GetForeignModifyBatchSize() to be
> > consistent with other nearby callbacks?
> >
>
> Yeah, good point.
>
> >> I've removed the check for the
> >> BatchInsert callback, though - the FDW knows whether it supports that,
> >> and it seems a bit pointless at the moment as there are no other batch
> >> callbacks. Maybe we should add an Assert somewhere, though?
> >
> > Hmm, not checking whether BatchInsert() exists may not be good idea,
> > because if an FDW's GetModifyBatchSize() returns a value > 1 but
> > there's no BatchInsert() function to call, ExecBatchInsert() would
> > trip.  I don't see the newly added documentation telling FDW authors
> > to either define both or none.
> >
>
> Hmm. The BatchInsert check seemed somewhat unnecessary to me, but OTOH
> it can't hurt, I guess. I'll ad it back.
>
> > Regarding how this plays with partitions, I don't think we need
> > ExecGetTouchedPartitions(), because you can get the routed-to
> > partitions using es_tuple_routing_result_relations.  Also, perhaps
>
> I'm not very familiar with es_tuple_routing_result_relations, but that
> doesn't seem to work. I've replaced the flushing code at the end of
> ExecModifyTable with a loop over es_tuple_routing_result_relations, but
> then some of the rows are missing (i.e. not flushed).


I should’ve mentioned es_opened_result_relations too which contain
non-routing result relations.  So I really meant if (proute) then use
es_tuple_routing_result_relations, else es_opened_result_relations.  This
should work as long as batching is only used for inserts.


> it's a good idea to put the "finishing" ExecBatchInsert() calls into a
> > function ExecFinishBatchInsert().  Maybe the logic to choose the
> > relations to perform the finishing calls on will get complicated in
> > the future as batching is added for updates/deletes too and it seems
> > better to encapsulate that in the separate function than have it out
> > in the open in ExecModifyTable().
> >
>
> IMO that'd be an over-engineering at this point. We don't need such
> separate function yet, so why complicate the API? If we need it in the
> future, we can add it.


Fair enough.
-- 
Am

Re: POC: postgres_fdw insert batching

2021-01-14 Thread Tomas Vondra

On 1/14/21 9:58 AM, Amit Langote wrote:
> Hi,
> 
> On Thu, Jan 14, 2021 at 2:41 AM Tomas Vondra
>  wrote:
>> On 1/13/21 3:43 PM, Tomas Vondra wrote:
>>> Thanks for the report. Yeah, I think there's a missing check in
>>> ExecInsert. Adding
>>>
>>>   (!resultRelInfo->ri_TrigDesc->trig_insert_after_row)
>>>
>>> solves this. But now I'm wondering if this is the wrong place to make
>>> this decision. I mean, why should we make the decision here, when the
>>> decision whether to have a RETURNING clause is made in postgres_fdw in
>>> deparseReturningList? We don't really know what the other FDWs will do,
>>> for example.
>>>
>>> So I think we should just move all of this into GetModifyBatchSize. We
>>> can start with ri_BatchSize = 0. And then do
>>>
>>>   if (resultRelInfo->ri_BatchSize == 0)
>>> resultRelInfo->ri_BatchSize =
>>>   resultRelInfo->ri_FdwRoutine->GetModifyBatchSize(resultRelInfo);
>>>
>>>   if (resultRelInfo->ri_BatchSize > 1)
>>>   {
>>> ... do batching ...
>>>   }
>>>
>>> The GetModifyBatchSize would always return value > 0, so either 1 (no
>>> batching) or >1 (batching).
>>>
>>
>> FWIW the attached v8 patch does this - most of the conditions are moved
>> to the GetModifyBatchSize() callback.
> 
> Thanks.  A few comments:
> 
> * I agree with leaving it up to an FDW to look at the properties of
> the table and of the operation being performed to decide whether or
> not to use batching, although maybe BeginForeignModify() is a better
> place for putting that logic instead of GetModifyBatchSize()?  So, in
> create_foreign_modify(), instead of PgFdwModifyState.batch_size simply
> being set to match the table's or the server's value for the
> batch_size option, make it also consider the things that prevent
> batching and set the execution state's batch_size based on that.
> GetModifyBatchSize() simply returns that value.
> 
> * Regarding the timing of calling GetModifyBatchSize() to set
> ri_BatchSize, I wonder if it wouldn't be better to call it just once,
> say from ExecInitModifyTable(), right after BeginForeignModify()
> returns?  I don't quite understand why it is being called from
> ExecInsert().  Can the batch size change once the execution starts?
> 

But it should be called just once. The idea is that initially we have
batch_size=0, and the fist call returns value that is >= 1. So we never
call it again. But maybe it could be called from BeginForeignModify, in
which case we'd not need this logic with first setting it to 0 etc.

> * Lastly, how about calling it GetForeignModifyBatchSize() to be
> consistent with other nearby callbacks?
> 

Yeah, good point.

>> I've removed the check for the
>> BatchInsert callback, though - the FDW knows whether it supports that,
>> and it seems a bit pointless at the moment as there are no other batch
>> callbacks. Maybe we should add an Assert somewhere, though?
> 
> Hmm, not checking whether BatchInsert() exists may not be good idea,
> because if an FDW's GetModifyBatchSize() returns a value > 1 but
> there's no BatchInsert() function to call, ExecBatchInsert() would
> trip.  I don't see the newly added documentation telling FDW authors
> to either define both or none.
> 

Hmm. The BatchInsert check seemed somewhat unnecessary to me, but OTOH
it can't hurt, I guess. I'll ad it back.

> Regarding how this plays with partitions, I don't think we need
> ExecGetTouchedPartitions(), because you can get the routed-to
> partitions using es_tuple_routing_result_relations.  Also, perhaps

I'm not very familiar with es_tuple_routing_result_relations, but that
doesn't seem to work. I've replaced the flushing code at the end of
ExecModifyTable with a loop over es_tuple_routing_result_relations, but
then some of the rows are missing (i.e. not flushed).

> it's a good idea to put the "finishing" ExecBatchInsert() calls into a
> function ExecFinishBatchInsert().  Maybe the logic to choose the
> relations to perform the finishing calls on will get complicated in
> the future as batching is added for updates/deletes too and it seems
> better to encapsulate that in the separate function than have it out
> in the open in ExecModifyTable().
> 

IMO that'd be an over-engineering at this point. We don't need such
separate function yet, so why complicate the API? If we need it in the
future, we can add it.

> (Sorry about being so late reviewing this.)

thanks

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: POC: postgres_fdw insert batching

2021-01-14 Thread Amit Langote

Hi,

On Thu, Jan 14, 2021 at 2:41 AM Tomas Vondra
 wrote:
> On 1/13/21 3:43 PM, Tomas Vondra wrote:
> > Thanks for the report. Yeah, I think there's a missing check in
> > ExecInsert. Adding
> >
> >   (!resultRelInfo->ri_TrigDesc->trig_insert_after_row)
> >
> > solves this. But now I'm wondering if this is the wrong place to make
> > this decision. I mean, why should we make the decision here, when the
> > decision whether to have a RETURNING clause is made in postgres_fdw in
> > deparseReturningList? We don't really know what the other FDWs will do,
> > for example.
> >
> > So I think we should just move all of this into GetModifyBatchSize. We
> > can start with ri_BatchSize = 0. And then do
> >
> >   if (resultRelInfo->ri_BatchSize == 0)
> > resultRelInfo->ri_BatchSize =
> >   resultRelInfo->ri_FdwRoutine->GetModifyBatchSize(resultRelInfo);
> >
> >   if (resultRelInfo->ri_BatchSize > 1)
> >   {
> > ... do batching ...
> >   }
> >
> > The GetModifyBatchSize would always return value > 0, so either 1 (no
> > batching) or >1 (batching).
> >
>
> FWIW the attached v8 patch does this - most of the conditions are moved
> to the GetModifyBatchSize() callback.

Thanks.  A few comments:

* I agree with leaving it up to an FDW to look at the properties of
the table and of the operation being performed to decide whether or
not to use batching, although maybe BeginForeignModify() is a better
place for putting that logic instead of GetModifyBatchSize()?  So, in
create_foreign_modify(), instead of PgFdwModifyState.batch_size simply
being set to match the table's or the server's value for the
batch_size option, make it also consider the things that prevent
batching and set the execution state's batch_size based on that.
GetModifyBatchSize() simply returns that value.

* Regarding the timing of calling GetModifyBatchSize() to set
ri_BatchSize, I wonder if it wouldn't be better to call it just once,
say from ExecInitModifyTable(), right after BeginForeignModify()
returns?  I don't quite understand why it is being called from
ExecInsert().  Can the batch size change once the execution starts?

* Lastly, how about calling it GetForeignModifyBatchSize() to be
consistent with other nearby callbacks?

> I've removed the check for the
> BatchInsert callback, though - the FDW knows whether it supports that,
> and it seems a bit pointless at the moment as there are no other batch
> callbacks. Maybe we should add an Assert somewhere, though?

Hmm, not checking whether BatchInsert() exists may not be good idea,
because if an FDW's GetModifyBatchSize() returns a value > 1 but
there's no BatchInsert() function to call, ExecBatchInsert() would
trip.  I don't see the newly added documentation telling FDW authors
to either define both or none.

Regarding how this plays with partitions, I don't think we need
ExecGetTouchedPartitions(), because you can get the routed-to
partitions using es_tuple_routing_result_relations.  Also, perhaps
it's a good idea to put the "finishing" ExecBatchInsert() calls into a
function ExecFinishBatchInsert().  Maybe the logic to choose the
relations to perform the finishing calls on will get complicated in
the future as batching is added for updates/deletes too and it seems
better to encapsulate that in the separate function than have it out
in the open in ExecModifyTable().

(Sorry about being so late reviewing this.)

--
Amit Langote
EDB: http://www.enterprisedb.com

RE: POC: postgres_fdw insert batching

2021-01-13 Thread tsunakawa.ta...@fujitsu.com

From: Tomas Vondra 
> FWIW the attached v8 patch does this - most of the conditions are moved to the
> GetModifyBatchSize() callback. I've removed the check for the BatchInsert
> callback, though - the FDW knows whether it supports that, and it seems a bit
> pointless at the moment as there are no other batch callbacks. Maybe we
> should add an Assert somewhere, though?

Thank you.  I'm in favor this idea that the decision to support RETURNING and 
trigger is left to the FDW.  I don' think of the need for another Assert, as 
the caller has one for the returned batch size.


Regards
Takayuki Tsunakawa

Re: POC: postgres_fdw insert batching

2021-01-13 Thread Tomas Vondra

On 1/13/21 3:43 PM, Tomas Vondra wrote:
>
> ...
>
> Thanks for the report. Yeah, I think there's a missing check in
> ExecInsert. Adding
> 
>   (!resultRelInfo->ri_TrigDesc->trig_insert_after_row)
> 
> solves this. But now I'm wondering if this is the wrong place to make
> this decision. I mean, why should we make the decision here, when the
> decision whether to have a RETURNING clause is made in postgres_fdw in
> deparseReturningList? We don't really know what the other FDWs will do,
> for example.
> 
> So I think we should just move all of this into GetModifyBatchSize. We
> can start with ri_BatchSize = 0. And then do
> 
>   if (resultRelInfo->ri_BatchSize == 0)
> resultRelInfo->ri_BatchSize =
>   resultRelInfo->ri_FdwRoutine->GetModifyBatchSize(resultRelInfo);
> 
>   if (resultRelInfo->ri_BatchSize > 1)
>   {
> ... do batching ...
>   }
> 
> The GetModifyBatchSize would always return value > 0, so either 1 (no
> batching) or >1 (batching).
> 

FWIW the attached v8 patch does this - most of the conditions are moved
to the GetModifyBatchSize() callback. I've removed the check for the
BatchInsert callback, though - the FDW knows whether it supports that,
and it seems a bit pointless at the moment as there are no other batch
callbacks. Maybe we should add an Assert somewhere, though?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 3cf7b4eb1e..2d38ab25cb 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -1711,7 +1711,7 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
  Index rtindex, Relation rel,
  List *targetAttrs, bool doNothing,
  List *withCheckOptionList, List *returningList,
- List **retrieved_attrs)
+ List **retrieved_attrs, int *values_end_len)
 {
 	AttrNumber	pindex;
 	bool		first;
@@ -1754,6 +1754,7 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 	}
 	else
 		appendStringInfoString(buf, " DEFAULT VALUES");
+	*values_end_len = buf->len;
 
 	if (doNothing)
 		appendStringInfoString(buf, " ON CONFLICT DO NOTHING");
@@ -1763,6 +1764,46 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 		 withCheckOptionList, returningList, retrieved_attrs);
 }
 
+/*
+ * rebuild remote INSERT statement
+ *
+ */
+void
+rebuildInsertSql(StringInfo buf, char *orig_query,
+ int values_end_len, int num_cols,
+ int num_rows)
+{
+	int			i, j;
+	int			pindex;
+	bool		first;
+
+	/* Copy up to the end of the first record from the original query */
+	appendBinaryStringInfo(buf, orig_query, values_end_len);
+
+	/* Add records to VALUES clause */
+	pindex = num_cols + 1;
+	for (i = 0; i < num_rows; i++)
+	{
+		appendStringInfoString(buf, ", (");
+
+		first = true;
+		for (j = 0; j < num_cols; j++)
+		{
+			if (!first)
+appendStringInfoString(buf, ", ");
+			first = false;
+
+			appendStringInfo(buf, "$%d", pindex);
+			pindex++;
+		}
+
+		appendStringInfoChar(buf, ')');
+	}
+
+	/* Copy stuff after VALUES clause from the original query */
+	appendStringInfoString(buf, orig_query + values_end_len);
+}
+
 /*
  * deparse remote UPDATE statement
  *
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index c11092f8cc..96bad17ded 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8911,7 +8911,7 @@ DO $d$
 END;
 $d$;
 ERROR:  invalid option "password"
-HINT:  Valid options in this context are: service, passfile, channel_binding, connect_timeout, dbname, host, hostaddr, port, options, application_name, keepalives, keepalives_idle, keepalives_interval, keepalives_count, tcp_user_timeout, sslmode, sslcompression, sslcert, sslkey, sslrootcert, sslcrl, requirepeer, ssl_min_protocol_version, ssl_max_protocol_version, gssencmode, krbsrvname, gsslib, target_session_attrs, use_remote_estimate, fdw_startup_cost, fdw_tuple_cost, extensions, updatable, fetch_size
+HINT:  Valid options in this context are: service, passfile, channel_binding, connect_timeout, dbname, host, hostaddr, port, options, application_name, keepalives, keepalives_idle, keepalives_interval, keepalives_count, tcp_user_timeout, sslmode, sslcompression, sslcert, sslkey, sslrootcert, sslcrl, requirepeer, ssl_min_protocol_version, ssl_max_protocol_version, gssencmode, krbsrvname, gsslib, target_session_attrs, use_remote_estimate, fdw_startup_cost, fdw_tuple_cost, extensions, updatable, fetch_size, batch_size
 CONTEXT:  SQL statement "ALTER SERVER loopback_nopw OPTIONS (ADD password 'dummypw')"
 PL/pgSQL function inline_code_block line 3 at EXECUTE
 -- If we add a password for our user mapping instead, we should get a different
@@ -9053,3 +9053,117 @@ SELECT 1 FROM ft1 LIMIT 1;
 ALTER SERVER loopback OPTIONS (ADD use_remote_estimate 'off');
 -- The invalid connection gets closed in pgfdw_xact_callback during commit.
 C

Re: POC: postgres_fdw insert batching

2021-01-13 Thread Tomas Vondra

On 1/13/21 10:15 AM, Amit Langote wrote:
> Hi Tomas, Tsunakawa-san,
> 
> Thanks for your work on this.
> 
> On Tue, Jan 12, 2021 at 11:06 AM Tomas Vondra
>  wrote:
>> AFAICS the discussions about making this use COPY and/or libpq
>> pipelining (neither of which is committed yet) ended with the conclusion
>> that those changes are somewhat independent, and that it's worth getting
>> this committed in the current form. Barring objections, I'll push this
>> within the next couple days.
> 
> I was trying this out today (been meaning to do so for a while) and
> noticed that this fails when there are AFTER ROW triggers on the
> foreign table.  Here's an example:
> 
> create extension postgres_fdw ;
> create server lb foreign data wrapper postgres_fdw ;
> create user mapping for current_user server lb;
> create table p (a numeric primary key);
> create foreign table fp (a int) server lb options (table_name 'p');
> create function print_row () returns trigger as $$ begin raise notice
> '%', new; return null; end; $$ language plpgsql;
> create trigger after_insert_trig after insert on fp for each row
> execute function print_row();
> insert into fp select generate_series (1, 10);
> 
> 
> Apparently, the new code seems to assume that batching wouldn't be
> active when the original query contains RETURNING clause but some
> parts fail to account for the case where RETURNING is added to the
> query to retrieve the tuple to pass to the AFTER TRIGGER.
> Specifically, the Assert in the following block in
> execute_foreign_modify() is problematic:
> 
> /* Check number of rows affected, and fetch RETURNING tuple if any */
> if (fmstate->has_returning)
> {
> Assert(*numSlots == 1);
> n_rows = PQntuples(res);
> if (n_rows > 0)
> store_returning_result(fmstate, slots[0], res);
> }
> 

Thanks for the report. Yeah, I think there's a missing check in
ExecInsert. Adding

  (!resultRelInfo->ri_TrigDesc->trig_insert_after_row)

solves this. But now I'm wondering if this is the wrong place to make
this decision. I mean, why should we make the decision here, when the
decision whether to have a RETURNING clause is made in postgres_fdw in
deparseReturningList? We don't really know what the other FDWs will do,
for example.

So I think we should just move all of this into GetModifyBatchSize. We
can start with ri_BatchSize = 0. And then do

  if (resultRelInfo->ri_BatchSize == 0)
resultRelInfo->ri_BatchSize =
  resultRelInfo->ri_FdwRoutine->GetModifyBatchSize(resultRelInfo);

  if (resultRelInfo->ri_BatchSize > 1)
  {
... do batching ...
  }

The GetModifyBatchSize would always return value > 0, so either 1 (no
batching) or >1 (batching).


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 3cf7b4eb1e..2d38ab25cb 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -1711,7 +1711,7 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
  Index rtindex, Relation rel,
  List *targetAttrs, bool doNothing,
  List *withCheckOptionList, List *returningList,
- List **retrieved_attrs)
+ List **retrieved_attrs, int *values_end_len)
 {
 	AttrNumber	pindex;
 	bool		first;
@@ -1754,6 +1754,7 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 	}
 	else
 		appendStringInfoString(buf, " DEFAULT VALUES");
+	*values_end_len = buf->len;
 
 	if (doNothing)
 		appendStringInfoString(buf, " ON CONFLICT DO NOTHING");
@@ -1763,6 +1764,46 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 		 withCheckOptionList, returningList, retrieved_attrs);
 }
 
+/*
+ * rebuild remote INSERT statement
+ *
+ */
+void
+rebuildInsertSql(StringInfo buf, char *orig_query,
+ int values_end_len, int num_cols,
+ int num_rows)
+{
+	int			i, j;
+	int			pindex;
+	bool		first;
+
+	/* Copy up to the end of the first record from the original query */
+	appendBinaryStringInfo(buf, orig_query, values_end_len);
+
+	/* Add records to VALUES clause */
+	pindex = num_cols + 1;
+	for (i = 0; i < num_rows; i++)
+	{
+		appendStringInfoString(buf, ", (");
+
+		first = true;
+		for (j = 0; j < num_cols; j++)
+		{
+			if (!first)
+appendStringInfoString(buf, ", ");
+			first = false;
+
+			appendStringInfo(buf, "$%d", pindex);
+			pindex++;
+		}
+
+		appendStringInfoChar(buf, ')');
+	}
+
+	/* Copy stuff after VALUES clause from the original query */
+	appendStringInfoString(buf, orig_query + values_end_len);
+}
+
 /*
  * deparse remote UPDATE statement
  *
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index c11092f8cc..96bad17ded 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8911,7 +8911,7 @@ DO $d$
 END;
 $d$;
 ERROR:  invalid option "password"
-HINT:  Valid options in this context are: servi

Re: POC: postgres_fdw insert batching

2021-01-13 Thread Amit Langote

Hi Tomas, Tsunakawa-san,

Thanks for your work on this.

On Tue, Jan 12, 2021 at 11:06 AM Tomas Vondra
 wrote:
> AFAICS the discussions about making this use COPY and/or libpq
> pipelining (neither of which is committed yet) ended with the conclusion
> that those changes are somewhat independent, and that it's worth getting
> this committed in the current form. Barring objections, I'll push this
> within the next couple days.

I was trying this out today (been meaning to do so for a while) and
noticed that this fails when there are AFTER ROW triggers on the
foreign table.  Here's an example:

create extension postgres_fdw ;
create server lb foreign data wrapper postgres_fdw ;
create user mapping for current_user server lb;
create table p (a numeric primary key);
create foreign table fp (a int) server lb options (table_name 'p');
create function print_row () returns trigger as $$ begin raise notice
'%', new; return null; end; $$ language plpgsql;
create trigger after_insert_trig after insert on fp for each row
execute function print_row();
insert into fp select generate_series (1, 10);

Apparently, the new code seems to assume that batching wouldn't be
active when the original query contains RETURNING clause but some
parts fail to account for the case where RETURNING is added to the
query to retrieve the tuple to pass to the AFTER TRIGGER.
Specifically, the Assert in the following block in
execute_foreign_modify() is problematic:

/* Check number of rows affected, and fetch RETURNING tuple if any */
if (fmstate->has_returning)
{
Assert(*numSlots == 1);
n_rows = PQntuples(res);
if (n_rows > 0)
store_returning_result(fmstate, slots[0], res);
}

-- 
Amit Langote
EDB: http://www.enterprisedb.com

RE: POC: postgres_fdw insert batching

2021-01-11 Thread tsunakawa.ta...@fujitsu.com

From: Tomas Vondra 
> Attached is a v6 of this patch, rebased to current master and with some minor
> improvements (mostly comments and renaming the "end" struct field to
> "values_end" which I think is more descriptive).

Thank you very much.  In fact, my initial patches used values_end, and I 
changed it to len considering that it may be used for bulk UPDATEand DELETE in 
the future.  But I think values_end is easier to understand its role, too.


Regards
Takayuki Tsunakawa

Re: POC: postgres_fdw insert batching

2021-01-11 Thread Tomas Vondra


Hi,

Attached is a v6 of this patch, rebased to current master and with some 
minor improvements (mostly comments and renaming the "end" struct field 
to "values_end" which I think is more descriptive).


The one thing that keeps bugging me is convert_prep_stmt_params - it 
dies the right thing, but the code is somewhat confusing.



AFAICS the discussions about making this use COPY and/or libpq 
pipelining (neither of which is committed yet) ended with the conclusion 
that those changes are somewhat independent, and that it's worth getting 
this committed in the current form. Barring objections, I'll push this 
within the next couple days.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
>From 1e4a99c6d4a5221dadc9e7a9922bdd9e3ebe1310 Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Tue, 12 Jan 2021 01:36:01 +0100
Subject: [PATCH] Add bulk insert for foreign tables

---
 contrib/postgres_fdw/deparse.c|  43 ++-
 .../postgres_fdw/expected/postgres_fdw.out| 116 ++-
 contrib/postgres_fdw/option.c |  14 +
 contrib/postgres_fdw/postgres_fdw.c   | 291 ++
 contrib/postgres_fdw/postgres_fdw.h   |   5 +-
 contrib/postgres_fdw/sql/postgres_fdw.sql |  91 ++
 doc/src/sgml/fdwhandler.sgml  |  89 +-
 doc/src/sgml/postgres-fdw.sgml|  13 +
 src/backend/executor/execPartition.c  |  11 +
 src/backend/executor/nodeModifyTable.c| 161 ++
 src/backend/nodes/list.c  |  15 +
 src/include/executor/execPartition.h  |   1 +
 src/include/foreign/fdwapi.h  |  10 +
 src/include/nodes/execnodes.h |   6 +
 src/include/nodes/pg_list.h   |  15 +
 15 files changed, 815 insertions(+), 66 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 3cf7b4eb1e..2d38ab25cb 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -1711,7 +1711,7 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
  Index rtindex, Relation rel,
  List *targetAttrs, bool doNothing,
  List *withCheckOptionList, List *returningList,
- List **retrieved_attrs)
+ List **retrieved_attrs, int *values_end_len)
 {
 	AttrNumber	pindex;
 	bool		first;
@@ -1754,6 +1754,7 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 	}
 	else
 		appendStringInfoString(buf, " DEFAULT VALUES");
+	*values_end_len = buf->len;
 
 	if (doNothing)
 		appendStringInfoString(buf, " ON CONFLICT DO NOTHING");
@@ -1763,6 +1764,46 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 		 withCheckOptionList, returningList, retrieved_attrs);
 }
 
+/*
+ * rebuild remote INSERT statement
+ *
+ */
+void
+rebuildInsertSql(StringInfo buf, char *orig_query,
+ int values_end_len, int num_cols,
+ int num_rows)
+{
+	int			i, j;
+	int			pindex;
+	bool		first;
+
+	/* Copy up to the end of the first record from the original query */
+	appendBinaryStringInfo(buf, orig_query, values_end_len);
+
+	/* Add records to VALUES clause */
+	pindex = num_cols + 1;
+	for (i = 0; i < num_rows; i++)
+	{
+		appendStringInfoString(buf, ", (");
+
+		first = true;
+		for (j = 0; j < num_cols; j++)
+		{
+			if (!first)
+appendStringInfoString(buf, ", ");
+			first = false;
+
+			appendStringInfo(buf, "$%d", pindex);
+			pindex++;
+		}
+
+		appendStringInfoChar(buf, ')');
+	}
+
+	/* Copy stuff after VALUES clause from the original query */
+	appendStringInfoString(buf, orig_query + values_end_len);
+}
+
 /*
  * deparse remote UPDATE statement
  *
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index c11092f8cc..96bad17ded 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -8911,7 +8911,7 @@ DO $d$
 END;
 $d$;
 ERROR:  invalid option "password"
-HINT:  Valid options in this context are: service, passfile, channel_binding, connect_timeout, dbname, host, hostaddr, port, options, application_name, keepalives, keepalives_idle, keepalives_interval, keepalives_count, tcp_user_timeout, sslmode, sslcompression, sslcert, sslkey, sslrootcert, sslcrl, requirepeer, ssl_min_protocol_version, ssl_max_protocol_version, gssencmode, krbsrvname, gsslib, target_session_attrs, use_remote_estimate, fdw_startup_cost, fdw_tuple_cost, extensions, updatable, fetch_size
+HINT:  Valid options in this context are: service, passfile, channel_binding, connect_timeout, dbname, host, hostaddr, port, options, application_name, keepalives, keepalives_idle, keepalives_interval, keepalives_count, tcp_user_timeout, sslmode, sslcompression, sslcert, sslkey, sslrootcert, sslcrl, requirepeer, ssl_min_protocol_version, ssl_max_protocol_version, gssencmode, krbsrvname, gsslib, target_session_attrs, use_remote_estimate, fdw_startup_cost, fdw_tuple_cost, extensions, updata

RE: POC: postgres_fdw insert batching

2020-11-30 Thread tsunakawa.ta...@fujitsu.com

From: David Fetter 
> Please pardon me for barging in late in this discussion, but if we're
> going to be using a bulk API here, wouldn't it make more sense to use
> COPY, except where RETURNING is specified, in place of INSERT?

Please do not hesitate.  I mentioned earlier in this thread that I think INSERT 
is better because:


--
* When the user executed INSERT statements, it would look strange to the user 
if the remote SQL is displayed as COPY.

* COPY doesn't invoke rules unlike INSERT.  (I don't think the rule is a 
feature what users care about, though.)  Also, I'm a bit concerned that there 
might be, or will be, other differences between INSERT and COPY.
--


Also, COPY to foreign tables currently uses INSERTs, the improvement of using 
COPY instead of INSERT is in progress [1].  Keeping "COPY uses COPY, INSERT 
uses INSERT" correspondence seems natural, and it makes COPY's high-speed 
advantage stand out.


[1]
Fast COPY FROM command for the table with foreign partitions
https://www.postgresql.org/message-id/flat/3d0909dc-3691-a576-208a-90986e55489f%40postgrespro.ru


Regards
Takayuki Tsunakawa

RE: POC: postgres_fdw insert batching

2020-11-30 Thread tsunakawa.ta...@fujitsu.com

From: Craig Ringer 
> It was not my intention to hold this patch up or greatly expand its
> scope. I'll spend some time testing it out and have a closer read soon
> to see if I can help progress it.

Thank you, I'm relieved to hear that.  Last weekend, I was scared of a possible 
mood that's something like "We won't accept the insert speedup patch for 
foreign tables unless you take full advantage of pipelining and achieve maximum 
conceivable speed!"


> I thought I gave it at the time, and a demo program. IIRC it was just
> doing small multi row inserts or single row inserts. Latency would've
> been a couple of hundred ms probably, I think I did something like
> running on my laptop (Australia, ADSL) to a server on AWS US or EU.

a couple of hundred ms, so that would be dominant in each 
prepare-send-execute-receive, possibly even for batch insert with hundreds of 
rows in each batch.  Then, the synchronous batch insert of the current patch 
may achieve a few hundreds times speedup compared to a single row inserts when 
the batch size is hundreds or more.


> > I'd like to check other DBMSs and your rich references for the FDW 
> > interface.
> (My first intuition is that many major DBMSs might not have client C APIs that
> can be used to implement an async pipelining FDW interface.
> 
> Likely correct for C APIs of other traditional DBMSes. I'd be less
> sure about newer non SQL ones, especially cloud oriented. For example
> DynamoDB supports at least async requests in the Java client [3] and
> C++ client [4]; it's not immediately clear if requests can be
> pipelined, but the API suggests they can.

I've checked ODBC, MySQL, Microsoft Synapse Analytics, Redshift, and BigQuery, 
guessing that the data warehouse may have asynchronous/pipelining API that 
enables efficient data integration/migration.  But none of them had one.  (I 
seem to have spent too long and am a bit tired... but it was a bit fun as 
well.)  They all support INSERT with multiple records in its VALUES clause.  
So, it will be useful to provide a synchronous batch insert FDW API.  I guess 
Oracle's OCI has an asynchronous API, but I didn't check it.

As an aside, MySQL 8.0.16 added support for asynchronous execution in its C 
API, but it allows only one active SQL statement in each connection.  Likewise, 
although the ODBC standard defines asynchronous execution 
(SQLSetStmtAttr(SQL_ASYNC_ENABLE) and SQLCompleteAsync), SQL Server and Synapse 
Analytics only allows only one active statement per connection.  psqlODBC 
doesn't support asynchronous execution.


> Most things with a REST-like API can do a fair bit of concurrency
> though. Multiple async nonblocking HTTP connections can be serviced at
> once. Or HTTP/1.1 pipelining can be used [1], or even better HTTP/2.0
> streams [2]. This is relevant for any REST-like API.

I'm not sure if this is related, Google deprecated Batch HTTP API [1].


[1]
https://cloud.google.com/bigquery/batch


Regards
Takayuki Tsunakawa

Re: POC: postgres_fdw insert batching

2020-11-30 Thread David Fetter

On Wed, Nov 25, 2020 at 05:04:36AM +, tsunakawa.ta...@fujitsu.com wrote:
> From: Tomas Vondra 
> > On 11/24/20 9:45 AM, tsunakawa.ta...@fujitsu.com wrote:
> > > OTOH, as for the name GetModifyBatchSize() you suggest, I think
> > GetInsertBatchSize may be better.  That is, this API deals with multiple
> > records in a single INSERT statement.  Your GetModifyBatchSize will be
> > reserved for statement batching when libpq has supported batch/pipelining to
> > execute multiple INSERT/UPDATE/DELETE statements, as in the following
> > JDBC batch updates.  What do you think?
> > >
> > 
> > I don't know. I was really only thinking about batching in the context
> > of a single DML command, not about batching of multiple commands at the
> > protocol level. IMHO it's far more likely we'll add support for batching
> > for DELETE/UPDATE than libpq pipelining, which seems rather different
> > from how the FDW API works. Which is why I was suggesting to use a name
> > that would work for all DML commands, not just for inserts.
> 
> Right, I can't imagine now how the interaction among the client, server core 
> and FDWs would be regarding the statement batching.  So I'll take your 
> suggested name.
> 
> 
> > Not sure, but I'd guess knowing whether batching is used would be
> > useful. We only print the single-row SQL query, which kinda gives the
> > impression that there's no batching.
> 
> Added in postgres_fdw like "Remote SQL" when EXPLAIN VERBOSE is run.
> 
> 
> > > Don't worry about this, too.  GetMaxBulkInsertTuples() just returns a 
> > > value
> > that was already saved in a struct in create_foreign_modify().
> > >
> > 
> > Well, I do worry for two reasons.
> > 
> > Firstly, the fact that in postgres_fdw the call is cheap does not mean
> > it'll be like that in every other FDW. Presumably, the other FDWs might
> > cache it in the struct and do the same thing, of course.
> > 
> > But the fact that we're calling it over and over for each row kinda
> > seems like we allow the value to change during execution, but I very
> > much doubt the code is expecting that. I haven't tried, but assume the
> > function first returns 10 and then 100. ISTM the code will allocate
> > ri_Slots with 25 slots, but then we'll try stashing 100 tuples there.
> > That can't end well. Sure, we can claim it's a bug in the FDW extension,
> > but it's also due to the API design.
> 
> You worried about other FDWs than postgres_fdw.  That's reasonable.  I 
> insisted in other threads that PG developers care only about postgres_fdw, 
> not other FDWs, when designing the FDW interface, but I myself made the same 
> mistake.  I made changes so that the executor calls GetModifyBatchSize() once 
> per relation per statement.

Please pardon me for barging in late in this discussion, but if we're
going to be using a bulk API here, wouldn't it make more sense to use
COPY, except where RETURNING is specified, in place of INSERT?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: POC: postgres_fdw insert batching

2020-11-29 Thread Craig Ringer

On Fri, 27 Nov 2020, 14:06 tsunakawa.ta...@fujitsu.com,
 wrote:
>
>
Also, I'm afraid it requires major surgery or reform of executor.  I
don't want it to delay the release of reasonably good (10x)
improvement with the synchronous interface.)

Totally sensible. If it isn't feasible without significant executor
change that's all that needs to be said.

I was afraid that'd be the case given the executor's pull flow but
just didn't know enough.

It was not my intention to hold this patch up or greatly expand its
scope. I'll spend some time testing it out and have a closer read soon
to see if I can help progress it.

I know Andres did some initial work on executor parallelism and
pipelining. I should take a look.

> > But in the libpq pipelining patch I demonstrated a 300 times (3000%) 
> > performance improvement on a test workload...
>
> Wow, impressive  number.  I've just seen it in the beginning of the libpq 
> pipelining thread (oh, already four years ago..!)

Yikes.

> Could you share the workload and the network latency (ping time)?  I'm sorry 
> I'm just overlooking it.

I thought I gave it at the time, and a demo program. IIRC it was just
doing small multi row inserts or single row inserts. Latency would've
been a couple of hundred ms probably, I think I did something like
running on my laptop (Australia, ADSL) to a server on AWS US or EU.

> Thank you for your (always) concise explanation.

You joke! I am many things but despite my best efforts concise is
rarely one of them.

> I'd like to check other DBMSs and your rich references for the FDW interface. 
>  (My first intuition is that many major DBMSs might not have client C APIs 
> that can be used to implement an async pipelining FDW interface.

Likely correct for C APIs of other traditional DBMSes. I'd be less
sure about newer non SQL ones, especially cloud oriented. For example
DynamoDB supports at least async requests in the Java client [3] and
C++ client [4]; it's not immediately clear if requests can be
pipelined, but the API suggests they can.

Most things with a REST-like API can do a fair bit of concurrency
though. Multiple async nonblocking HTTP connections can be serviced at
once. Or HTTP/1.1 pipelining can be used [1], or even better HTTP/2.0
streams [2]. This is relevant for any REST-like API.

> (It'd be kind of you to send emails in text format.  I've changed the format 
> of this reply from HTML to text.)

I try to remember. Stupid Gmail.  Sorry. On mobile it offers very
little control over format but I'll do my best when I can.

[1] https://en.wikipedia.org/wiki/HTTP_pipelining
[2] https://blog.restcase.com/http2-benefits-for-rest-apis/
[3] 
https://aws.amazon.com/blogs/developer/asynchronous-requests-with-the-aws-sdk-for-java/
[4] 
https://sdk.amazonaws.com/cpp/api/LATEST/class_aws_1_1_dynamo_d_b_1_1_dynamo_d_b_client.html#ab631edaccca5f3f8988af15e7e9aa4f0

Re: POC: postgres_fdw insert batching

2020-11-27 Thread Craig Ringer

On Sat, 28 Nov 2020, 10:10 Tomas Vondra, 
wrote:

>
>
> On 11/27/20 7:05 AM, tsunakawa.ta...@fujitsu.com wrote:
>
> However, the FDW interface as it's implemented today is not designed to
> allow that, I believe (we pretty much just invoke the FWD callbacks as
> if it was a local AM). It assumes the calls are synchronous, and
> redesigning it to work in async way is a much larger/complex patch than
> what's being discussed here.
>
> I do think the FDW extension proposed here (adding the bulk-insert
> callback) is useful in general, for two reasons: (a) even if most client
> libraries support some sort of pipelining, some don't, and (b) I'd bet
> it's still more efficient to send one large insert than pipelining many
> individual inserts.
>
> That being said, I'm against expanding the scope of this patch to also
> require redesign of the whole FDW infrastructure - that would likely
> mean no such improvement landing in PG14. If the libpq pipelining patch
> seems likely to get committed, we can try using it for the bulk insert
> callback (instead of the current multi-value stuff).
>

I totally agree on all points. It was not my intent to expand the scope of
this significantly and I really don't want to hold it back.

I raised the interface consideration in case it was something easy to
accommodate. It's not, so that's done, topic over.

Re: POC: postgres_fdw insert batching

2020-11-27 Thread Tomas Vondra

On 11/27/20 7:05 AM, tsunakawa.ta...@fujitsu.com wrote:
> From: Craig Ringer 
>> But in the libpq pipelining patch I demonstrated a 300 times
>> (3000%) performance improvement on a test workload...
> 
> Wow, impressive  number.  I've just seen it in the beginning of the
> libpq pipelining thread (oh, already four years ago..!)  Could you
> share the workload and the network latency (ping time)?  I'm sorry
> I'm just overlooking it.
> 
> Thank you for your (always) concise explanation.  I'd like to check
> other DBMSs and your rich references for the FDW interface.  (My
> first intuition is that many major DBMSs might not have client C APIs
> that can be used to implement an async pipelining FDW interface.
> Also, I'm afraid it requires major surgery or reform of executor.  I
> don't want it to delay the release of reasonably good (10x)
> improvement with the synchronous interface.)
> 

I do agree that pipelining is nice, and can bring huge improvements.

However, the FDW interface as it's implemented today is not designed to
allow that, I believe (we pretty much just invoke the FWD callbacks as
if it was a local AM). It assumes the calls are synchronous, and
redesigning it to work in async way is a much larger/complex patch than
what's being discussed here.

I do think the FDW extension proposed here (adding the bulk-insert
callback) is useful in general, for two reasons: (a) even if most client
libraries support some sort of pipelining, some don't, and (b) I'd bet
it's still more efficient to send one large insert than pipelining many
individual inserts.

That being said, I'm against expanding the scope of this patch to also
require redesign of the whole FDW infrastructure - that would likely
mean no such improvement landing in PG14. If the libpq pipelining patch
seems likely to get committed, we can try using it for the bulk insert
callback (instead of the current multi-value stuff).

> (It'd be kind of you to send emails in text format.  I've changed the
> format of this reply from HTML to text.)
> 

Craig's client is sending messages in both text/plain and text/html. You
probably need to tell your client to prefer that over html, somehow.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

RE: POC: postgres_fdw insert batching

2020-11-26 Thread tsunakawa.ta...@fujitsu.com

From: Craig Ringer  
> But in the libpq pipelining patch I demonstrated a 300 times (3000%) 
> performance improvement on a test workload...

Wow, impressive  number.  I've just seen it in the beginning of the libpq 
pipelining thread (oh, already four years ago..!)  Could you share the workload 
and the network latency (ping time)?  I'm sorry I'm just overlooking it.

Thank you for your (always) concise explanation.  I'd like to check other DBMSs 
and your rich references for the FDW interface.  (My first intuition is that 
many major DBMSs might not have client C APIs that can be used to implement an 
async pipelining FDW interface.  Also, I'm afraid it requires major surgery or 
reform of executor.  I don't want it to delay the release of reasonably good 
(10x) improvement with the synchronous interface.)

(It'd be kind of you to send emails in text format.  I've changed the format of 
this reply from HTML to text.)


Regards
Takayuki Tsunakawa

Re: POC: postgres_fdw insert batching

2020-11-26 Thread Craig Ringer

On Fri, Nov 27, 2020 at 10:47 AM tsunakawa.ta...@fujitsu.com <
tsunakawa.ta...@fujitsu.com> wrote:

Covering this one first:

I expect postgresExecForeignBatchInsert() would be able to use the libpq
> batching API, because it receives an array of tuples and can generate and
> issue INSERT statement for each tuple.

Sure, you can generate big multi-inserts. Or even do a COPY. But you still
have to block for a full round-trip until the foreign server replies. So if
you have 6000 calls to postgresExecForeignBatchInsert() during a single
query, and a 100ms round trip time to the foreign server, you're going to
waste 6000*0.1 = 600s = 10 min blocked in  postgresExecForeignBatchInsert()
waiting for results from the foreign server.

Such batches have some major downsides:

* The foreign server cannot start executing the first query in the batch
until the last query in the batch has been accumulated and the whole batch
has been sent to the foreign server;
* The FDW has to block waiting for the batch to execute on the foreign
server and for a full network round-trip before it can start another batch
or let the backend do other work
This means RTTs get multiplied by batch counts. Still a lot better than
individual statements, but plenty slow for high latency connections.

* Prepare 1000 rows to insert [10ms]
* INSERT 1000 values [100ms RTT + 50ms foreign server execution time]
* Prepare 1000 rows to insert [10ms]
* INSERT 1000 values [100ms RTT + 50ms foreign server execution time]
* ...

If you can instead send new inserts (or sets of inserts) to the foreign
server without having to wait for the result of the previous batch to
arrive, you can spend 100ms total waiting for results instead of 10 mins.
You can start the execution of the first query earlier, spend less time
blocked waiting on network, and let the local backend continue doing other
work while the foreign server is busy executing the statements.

The time spent preparing local rows to insert now overlaps with the RTT and
remote execution time, instead of happening serially. And there only has to
be one RTT wait, assuming the foreign server and network can keep up with
the rate we are generating requests at.

I can throw together some diagrams if it'll help. But in the libpq
pipelining patch I demonstrated a 300 times (3000%) performance improvement
on a test workload...

Anyway, this thread's batch insert can be progressed (and hopefully
> committed), and once the libpq batching has been committed, we can give it
> a try to use it and modify postgres_fdw to see if we can get further
> performance boost.
>

My point is that you should seriously consider whether batching is the
appropriate interface here, or whether the FDW can expose a pipeline-like
"queue work" then "wait for results" interface. That can be used to
implement batching exactly as currently proposed, it does not have to wait
for any libpq pipelining features. But it can *also* be used to implement
concurrent async requests in other FDWs, and to implement pipelining in
postgres_fdw once the needed libpq support is available.

I don't know the FDW to postgres API well enough, and it's possible I'm
talking entirely out of my hat here.

> From: Tomas Vondra 
> > Not sure how is this related to app developers? I think the idea was
> > that the libpq features might be useful between the two PostgreSQL
> > instances. I.e. the postgres_fdw would use the libpq batching to send
> > chunks of data to the other side.
>
> > Well, my point was that we could keep the API, but maybe it should be
> > implemented using the proposed libpq batching. They could still use the
> > postgres_fdw example how to use the API, but the internals would need to
> > be different, of course.
>
> Yes, I understand them.  I just wondered if app developers use the
> statement batching API for libpq or JDBC in what kind of apps.

For JDBC, yes, it's used very heavily and has been for a long time, because
PgJDBC doesn't rely on libpq - it implements the protocol directly and
isn't bound by libpq's limitations. The application interface for it in
JDBC is a batch interface [1][2], not a pipelined interface, so that's what
PgJDBC users interact with [3] but batch execution is implemented using
protocol pipelining support inside PgJDBC [4]. A while ago I did some work
on deadlock prevention to work around issues with PgJDBC's implementation
[5] which was needed because the feature was so heavily used. Both were to
address customer needs in real world applications. The latter increased
application performance over 50x through round-trip elimination.

For libpq, no, batching and pipelining are not yet used by anybody because
application authors have to write to the libpq API and there hasn't been
any in-core support for batching. We've had async / non-blocking support
for a while, but it still enforces strict request/response ordering without
interleaving, so application authors cannot make use of the same postgres
server and proto

RE: POC: postgres_fdw insert batching

2020-11-26 Thread tsunakawa.ta...@fujitsu.com

From: Tomas Vondra 
> Not sure how is this related to app developers? I think the idea was
> that the libpq features might be useful between the two PostgreSQL
> instances. I.e. the postgres_fdw would use the libpq batching to send
> chunks of data to the other side.

> Well, my point was that we could keep the API, but maybe it should be
> implemented using the proposed libpq batching. They could still use the
> postgres_fdw example how to use the API, but the internals would need to
> be different, of course.

Yes, I understand them.  I just wondered if app developers use the statement 
batching API for libpq or JDBC in what kind of apps.  That is, I talked about 
the batching API itself, not related to FDW.  (So, I mentioned I think I should 
ask such a question in the libpq batching thread.)

I expect postgresExecForeignBatchInsert() would be able to use the libpq 
batching API, because it receives an array of tuples and can generate and issue 
INSERT statement for each tuple.  But I'm not sure either if the libpq batching 
is likely to be committed in the near future.  (The thread looks too long...)  
Anyway, this thread's batch insert can be progressed (and hopefully committed), 
and once the libpq batching has been committed, we can give it a try to use it 
and modify postgres_fdw to see if we can get further performance boost.


Regards
Takayuki Tsunakawa

Re: POC: postgres_fdw insert batching

2020-11-26 Thread Craig Ringer

On Fri, Nov 27, 2020 at 3:34 AM Tomas Vondra 
wrote:

> Not sure how is this related to app developers? I think the idea was
> that the libpq features might be useful between the two PostgreSQL
> instances. I.e. the postgres_fdw would use the libpq batching to send
> chunks of data to the other side.
>

Right. Or at least, when designing the FDW API, do so in a way that doesn't
strictly enforce Request/Response alternation without interleaving, so you
can benefit from it in the future.

It's hardly just libpq after all. A *lot* of client libraries and drivers
will be capable of non-blocking reads or writes with multiple ones in
flight at once. Any REST-like API generally can, for example. So for
performance reasons we should if possible avoid baking the assumption that
a request cannot be made until the response from the previous request is
received, and instead have a wait interface to use for when a new request
requires the prior response's result before it can proceed.

Well, my point was that we could keep the API, but maybe it should be
> implemented using the proposed libpq batching. They could still use the
> postgres_fdw example how to use the API, but the internals would need to
> be different, of course.
>

Sure. Or just allow room for it in the FDW API, though using the pipelining
support natively would be nice.

If the FDW interface allows Pg to

Insert A
Insert B
Insert C
Wait for outcome of insert A
...

then that'll be useful for using libpq pipelining, but also FDWs for all
sorts of other DBs, especially cloud-y ones where latency is a big concern.

Re: POC: postgres_fdw insert batching

2020-11-26 Thread Tomas Vondra




On 11/26/20 2:48 AM, tsunakawa.ta...@fujitsu.com wrote:
> From: Tomas Vondra 
>> Well, good that we all agree this is a useful feature to have (in 
>> general). The question is whether postgres_fdw should be doing 
>> batching on it's onw (per this thread) or rely on some other 
>> feature (libpq pipelining). I haven't followed the other thread,
>> so I don't have an opinion on that.
> 
> Well, as someone said in this thread, I think bulk insert is much 
> more common than updates/deletes.  Thus, major DBMSs have INSERT 
> VALUES(record1), (record2)... and INSERT SELECT.  Oracle has direct 
> path INSERT in addition.  As for the comparison of INSERT with 
> multiple records and libpq batching (= multiple INSERTs), I think
> the former is more efficient because the amount of data transfer is
> less and the parsing-planning of INSERT for each record is
> eliminated.
> 
> I never deny the usefulness of libpq batch/pipelining, but I'm not 
> sure if app developers would really use it.  If they want to reduce 
> the client-server round-trips, won't they use traditional stored 
> procedures?  Yes, the stored procedure language is very 
> DBMS-specific.  Then, I'd like to know what kind of well-known 
> applications are using standard batching API like JDBC's batch 
> updates.  (Sorry, I think that should be discussed in libpq 
> batch/pipelining thread and this thread should not be polluted.)
> 

Not sure how is this related to app developers? I think the idea was
that the libpq features might be useful between the two PostgreSQL
instances. I.e. the postgres_fdw would use the libpq batching to send
chunks of data to the other side.

> 
>> Note however we're doing two things here, actually - we're 
>> implementing custom batching for postgres_fdw, but we're also 
>> extending the FDW API to allow other implementations do the same 
>> thing. And most of them won't be able to rely on the connection 
>> library providing that, I believe.
> 
> I'm afraid so, too.  Then, postgres_fdw would be an example that 
> other FDW developers would look at when they use INSERT with
> multiple records.
> 

Well, my point was that we could keep the API, but maybe it should be
implemented using the proposed libpq batching. They could still use the
postgres_fdw example how to use the API, but the internals would need to
be different, of course.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

RE: POC: postgres_fdw insert batching

2020-11-25 Thread tsunakawa.ta...@fujitsu.com

From: Tomas Vondra 
> Well, good that we all agree this is a useful feature to have (in
> general). The question is whether postgres_fdw should be doing batching
> on it's onw (per this thread) or rely on some other feature (libpq
> pipelining). I haven't followed the other thread, so I don't have an
> opinion on that.

Well, as someone said in this thread, I think bulk insert is much more common 
than updates/deletes.  Thus, major DBMSs have INSERT VALUES(record1), 
(record2)... and INSERT SELECT.  Oracle has direct path INSERT in addition.  As 
for the comparison of INSERT with multiple records and libpq batching (= 
multiple INSERTs), I think the former is more efficient because the amount of 
data transfer is less and the parsing-planning of INSERT for each record is 
eliminated.

I never deny the usefulness of libpq batch/pipelining, but I'm not sure if app 
developers would really use it.  If they want to reduce the client-server 
round-trips, won't they use traditional stored procedures?  Yes, the stored 
procedure language is very DBMS-specific.  Then, I'd like to know what kind of 
well-known applications are using standard batching API like JDBC's batch 
updates.  (Sorry, I think that should be discussed in libpq batch/pipelining 
thread and this thread should not be polluted.)


> Note however we're doing two things here, actually - we're implementing
> custom batching for postgres_fdw, but we're also extending the FDW API
> to allow other implementations do the same thing. And most of them won't
> be able to rely on the connection library providing that, I believe.

I'm afraid so, too.  Then, postgres_fdw would be an example that other FDW 
developers would look at when they use INSERT with multiple records.


Regards
Takayuki Tsunakawa

Re: POC: postgres_fdw insert batching

2020-11-25 Thread Tomas Vondra

On 11/25/20 7:31 AM, tsunakawa.ta...@fujitsu.com wrote:
> From: Craig Ringer 
>> I suggest that when developing this, you keep in mind the ongoing
>> work on the libpq pipelining/batching enhancements, and also the
>> way many interfaces to foreign data sources support asynchronous,
>> concurrent operations.
> 
> Yes, thank you, I bear it in mind.  I understand it's a feature for
> batching multiple kinds of SQL statements like DBC's batch updates.
> 

I haven't followed the libpq pipelining thread very closely. It does
seem related, but I'm not sure if it's a good match for this patch, or
how far is it from being committable ...

> 
>> I'd argue it's pretty much vital for decent performance when
>> talking to a cloud database from an on-prem server for example, or
>> any other time that round-trip-time reduction is important.
> 
> Yeah, I'm thinking of the data migration and integration as the
> prominent use case.
> 

Well, good that we all agree this is a useful feature to have (in
general). The question is whether postgres_fdw should be doing batching
on it's onw (per this thread) or rely on some other feature (libpq
pipelining). I haven't followed the other thread, so I don't have an
opinion on that.

Note however we're doing two things here, actually - we're implementing
custom batching for postgres_fdw, but we're also extending the FDW API
to allow other implementations do the same thing. And most of them won't
be able to rely on the connection library providing that, I believe.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

RE: POC: postgres_fdw insert batching

2020-11-24 Thread tsunakawa.ta...@fujitsu.com

From: Craig Ringer  
> I suggest that when developing this, you keep in mind the ongoing work on the 
> libpq pipelining/batching enhancements, and also the way many interfaces to 
> foreign data sources support asynchronous, concurrent operations.

Yes, thank you, I bear it in mind.  I understand it's a feature for batching 
multiple kinds of SQL statements like DBC's batch updates.


> I'd argue it's pretty much vital for decent performance when talking to a 
> cloud database from an on-prem server for example, or any other time that 
> round-trip-time reduction is important.

Yeah, I'm thinking of the data migration and integration as the prominent use 
case.


Regards
Takayuki Tsunakawa

RE: POC: postgres_fdw insert batching

2020-11-24 Thread tsunakawa.ta...@fujitsu.com

From: Tomas Vondra 
> On 11/24/20 9:45 AM, tsunakawa.ta...@fujitsu.com wrote:
> > OTOH, as for the name GetModifyBatchSize() you suggest, I think
> GetInsertBatchSize may be better.  That is, this API deals with multiple
> records in a single INSERT statement.  Your GetModifyBatchSize will be
> reserved for statement batching when libpq has supported batch/pipelining to
> execute multiple INSERT/UPDATE/DELETE statements, as in the following
> JDBC batch updates.  What do you think?
> >
> 
> I don't know. I was really only thinking about batching in the context
> of a single DML command, not about batching of multiple commands at the
> protocol level. IMHO it's far more likely we'll add support for batching
> for DELETE/UPDATE than libpq pipelining, which seems rather different
> from how the FDW API works. Which is why I was suggesting to use a name
> that would work for all DML commands, not just for inserts.

Right, I can't imagine now how the interaction among the client, server core 
and FDWs would be regarding the statement batching.  So I'll take your 
suggested name.


> Not sure, but I'd guess knowing whether batching is used would be
> useful. We only print the single-row SQL query, which kinda gives the
> impression that there's no batching.

Added in postgres_fdw like "Remote SQL" when EXPLAIN VERBOSE is run.


> > Don't worry about this, too.  GetMaxBulkInsertTuples() just returns a value
> that was already saved in a struct in create_foreign_modify().
> >
> 
> Well, I do worry for two reasons.
> 
> Firstly, the fact that in postgres_fdw the call is cheap does not mean
> it'll be like that in every other FDW. Presumably, the other FDWs might
> cache it in the struct and do the same thing, of course.
> 
> But the fact that we're calling it over and over for each row kinda
> seems like we allow the value to change during execution, but I very
> much doubt the code is expecting that. I haven't tried, but assume the
> function first returns 10 and then 100. ISTM the code will allocate
> ri_Slots with 25 slots, but then we'll try stashing 100 tuples there.
> That can't end well. Sure, we can claim it's a bug in the FDW extension,
> but it's also due to the API design.

You worried about other FDWs than postgres_fdw.  That's reasonable.  I insisted 
in other threads that PG developers care only about postgres_fdw, not other 
FDWs, when designing the FDW interface, but I myself made the same mistake.  I 
made changes so that the executor calls GetModifyBatchSize() once per relation 
per statement.


Regards
Takayuki Tsunakawa



v5-0001-Add-bulk-insert-for-foreign-tables.patch
Description: v5-0001-Add-bulk-insert-for-foreign-tables.patch

Re: POC: postgres_fdw insert batching

2020-11-24 Thread Craig Ringer

On Thu, Oct 8, 2020 at 10:40 AM tsunakawa.ta...@fujitsu.com <
tsunakawa.ta...@fujitsu.com> wrote:

>
> Thank you for picking up this.  I'm interested in this topic, too.  (As an
> aside, we'd like to submit a bulk insert patch for ECPG in the near future.)
>
> As others referred, Andrey-san's fast COPY to foreign partitions is also
> promising.  But I think your bulk INSERT is a separate feature and offers
> COPY cannot do -- data transformation during loading with INSERT SELECT and
> CREATE TABLE AS SELECT.
>
> Is there anything that makes you worry and stops development?  Could I
> give it a try to implement this (I'm not sure I can, sorry.  I'm worried if
> we can change the executor's call chain easily.)
>
>
I suggest that when developing this, you keep in mind the ongoing work on
the libpq pipelining/batching enhancements, and also the way many
interfaces to foreign data sources support asynchronous, concurrent
operations.

Best results with postgres_fdw insert batching would be achieved if it can
also send its batches as asynchronous queries and only block when it's
required to report on the results of the work. This will also be true of
any other FDW where the backing remote interface can support asynchronous
concurrent or pipelined operation.

I'd argue it's pretty much vital for decent performance when talking to a
cloud database from an on-prem server for example, or any other time that
round-trip-time reduction is important.

The most important characteristic of an FDW API to permit this would be
decoupling of request and response into separate non-blocking calls that
don't have to occur in ordered pairs. Instead of "insert_foo(foo) ->
insert_result", have "queue_insert_foo(foo) -> future_result",
"get_result_if_available(future_result) -> maybe result" and
"get_result_blocking(future_result) -> result". Permit multiple
queue_insert_foo(...)s without a/b interleaving with result fetches being
required.

Ideally it'd be able to accumulate small batches of inserts locally and
send a batch to the remote end once it's accumulated enough. But instead of
blocking waiting for the result, return control to the executor after
sending, without forcing a socket flush (which might block) and without
waiting to learn what the outcome was. Allow new batches to be accumulated
and sent before the results of the first batch are received, so long as
it's within the same executor node so we don't make any unfortunate
mistakes with mixing things up in compound statements or functions etc.
Only report outcomes like rowcounts lazily when results are received, or
when required to do so.

If now we have

REQUEST -> [block] -> RESULT
~~ round trip delay ~~
REQUEST -> [block] -> RESULT
~~ round trip delay ~~
REQUEST -> [block] -> RESULT
~~ round trip delay ~~
REQUEST -> [block] -> RESULT

and batching would give us

{ REQUEST, REQUEST} -> [block] -> { RESULT, RESULT }
~~ round trip delay ~~
{ REQUEST, REQUEST} -> [block] -> { RESULT, RESULT }

consider if room can be left in the batching API to permit:

{ REQUEST, REQUEST} -> [nonblocking send...]
{ REQUEST, REQUEST} -> [nonblocking send...]
~~ round trip delay ~~
[] -> RESULT, RESULT
[] -> RESULT, RESULT

... where we only actually block at the point where the result is required
as input into the next node.

Honestly I don't know the executor structure well enough to say if this is
even remotely feasible right now. Maybe Andres may be able to comment. But
please keep it in mind if you're thinking of making FDW API changes.

Re: POC: postgres_fdw insert batching

2020-11-24 Thread Tomas Vondra




On 11/24/20 9:45 AM, tsunakawa.ta...@fujitsu.com wrote:
> From: Tomas Vondra 
>> 1) We're calling it "batch_size" but the API function is named
>> postgresGetMaxBulkInsertTuples(). Perhaps we should rename the function
>> to postgresGetModifyBatchSize()? That has the advantage it'd work if we
>> ever add support for batching to UPDATE/DELETE.
> 
> Actually, I was in two minds whether the term batch or bulk is better.  
> Because Oracle uses "bulk insert" and "bulk fetch", like in FETCH cur BULK 
> COLLECT INTO array and FORALL in array INSERT INTO, while JDBC uses batch as 
> in "batch updates" and its API method names (addBatch, executeBatch).
> 
> But it seems better or common to use batch according to the etymology and the 
> following Stack Overflow page:
> 
> https://english.stackexchange.com/questions/141884/which-is-a-better-and-commonly-used-word-bulk-or-batch
> 
> OTOH, as for the name GetModifyBatchSize() you suggest, I think 
> GetInsertBatchSize may be better.  That is, this API deals with multiple 
> records in a single INSERT statement.  Your GetModifyBatchSize will be 
> reserved for statement batching when libpq has supported batch/pipelining to 
> execute multiple INSERT/UPDATE/DELETE statements, as in the following JDBC 
> batch updates.  What do you think?
> 

I don't know. I was really only thinking about batching in the context
of a single DML command, not about batching of multiple commands at the
protocol level. IMHO it's far more likely we'll add support for batching
for DELETE/UPDATE than libpq pipelining, which seems rather different
from how the FDW API works. Which is why I was suggesting to use a name
that would work for all DML commands, not just for inserts.

> CODE EXAMPLE 14-1 Creating and executing a batch of insert statements 
> --
> Statement stmt = con.createStatement(); 
> stmt.addBatch("INSERT INTO employees VALUES (1000, 'Joe Jones')"); 
> stmt.addBatch("INSERT INTO departments VALUES (260, 'Shoe')"); 
> stmt.addBatch("INSERT INTO emp_dept VALUES (1000, 260)"); 
> 
> // submit a batch of update commands for execution 
> int[] updateCounts = stmt.executeBatch(); 
> --
> 

Sure. We already have a patch to support something like this at the
libpq level, IIRC. But I'm not sure how well that matches the FDW API
approach in general.

> 
>> 2) Do we have to lookup the batch_size in create_foreign_modify (in
>> server/table options)? I'd have expected to look it up while planning
>> the modify and then pass it through the list, just like the other
>> FdwModifyPrivateIndex stuff. But maybe that's not possible.
> 
> Don't worry, create_foreign_modify() is called from PlanForeignModify() 
> during planning.  Unfortunately, it's also called from BeginForeignInsert(), 
> but other stuff passed to create_foreign_modify() including the query string 
> is constructed there.
> 

Hmm, ok.

> 
>> 3) That reminds me - should we show the batching info on EXPLAIN? That
>> seems like a fairly interesting thing to show to the user. Perhaps
>> showing the average batch size would also be useful? Or maybe not, we
>> create the batches as large as possible, with the last one smaller.
> 
> Hmm, maybe batch_size is not for EXPLAIN because its value doesn't change 
> dynamically based on the planning or system state unlike shared buffers and 
> parallel workers.  OTOH, I sometimes want to see what configuration parameter 
> values the user set, such as work_mem, enable_*, and shared_buffers, together 
> with the query plan (EXPLAIN and auto_explain).  For example, it'd be nice if 
> EXPLAIN (parameters on) could do that.  Some relevant FDW-related parameters 
> could be included in that output.
> 

Not sure, but I'd guess knowing whether batching is used would be
useful. We only print the single-row SQL query, which kinda gives the
impression that there's no batching.

>> 4) It seems that ExecInsert executes GetMaxBulkInsertTuples() over and
>> over for every tuple. I don't know it that has measurable impact, but it
>> seems a bit excessive IMO. I don't think we should support the batch
>> size changing during execution (seems tricky).
> 
> Don't worry about this, too.  GetMaxBulkInsertTuples() just returns a value 
> that was already saved in a struct in create_foreign_modify().
> 

Well, I do worry for two reasons.

Firstly, the fact that in postgres_fdw the call is cheap does not mean
it'll be like that in every other FDW. Presumably, the other FDWs might
cache it in the struct and do the same thing, of course.

But the fact that we're calling it over and over for each row kinda
seems like we allow the value to change during execution, but I very
much doubt the code is expecting that. I haven't tried, but assume the
function first returns 10 and then 100. ISTM the code will allocate
ri_Slots with 25 slots, but then we'll try stashing 100 tuples there.
That can't end well. Sure, we can claim it's

RE: POC: postgres_fdw insert batching

2020-11-24 Thread tsunakawa.ta...@fujitsu.com

From: Tomas Vondra 
> 1) We're calling it "batch_size" but the API function is named
> postgresGetMaxBulkInsertTuples(). Perhaps we should rename the function
> to postgresGetModifyBatchSize()? That has the advantage it'd work if we
> ever add support for batching to UPDATE/DELETE.

Actually, I was in two minds whether the term batch or bulk is better.  Because 
Oracle uses "bulk insert" and "bulk fetch", like in FETCH cur BULK COLLECT INTO 
array and FORALL in array INSERT INTO, while JDBC uses batch as in "batch 
updates" and its API method names (addBatch, executeBatch).

But it seems better or common to use batch according to the etymology and the 
following Stack Overflow page:

https://english.stackexchange.com/questions/141884/which-is-a-better-and-commonly-used-word-bulk-or-batch

OTOH, as for the name GetModifyBatchSize() you suggest, I think 
GetInsertBatchSize may be better.  That is, this API deals with multiple 
records in a single INSERT statement.  Your GetModifyBatchSize will be reserved 
for statement batching when libpq has supported batch/pipelining to execute 
multiple INSERT/UPDATE/DELETE statements, as in the following JDBC batch 
updates.  What do you think?

CODE EXAMPLE 14-1 Creating and executing a batch of insert statements 
--
Statement stmt = con.createStatement(); 
stmt.addBatch("INSERT INTO employees VALUES (1000, 'Joe Jones')"); 
stmt.addBatch("INSERT INTO departments VALUES (260, 'Shoe')"); 
stmt.addBatch("INSERT INTO emp_dept VALUES (1000, 260)"); 

// submit a batch of update commands for execution 
int[] updateCounts = stmt.executeBatch(); 
--


> 2) Do we have to lookup the batch_size in create_foreign_modify (in
> server/table options)? I'd have expected to look it up while planning
> the modify and then pass it through the list, just like the other
> FdwModifyPrivateIndex stuff. But maybe that's not possible.

Don't worry, create_foreign_modify() is called from PlanForeignModify() during 
planning.  Unfortunately, it's also called from BeginForeignInsert(), but other 
stuff passed to create_foreign_modify() including the query string is 
constructed there.


> 3) That reminds me - should we show the batching info on EXPLAIN? That
> seems like a fairly interesting thing to show to the user. Perhaps
> showing the average batch size would also be useful? Or maybe not, we
> create the batches as large as possible, with the last one smaller.

Hmm, maybe batch_size is not for EXPLAIN because its value doesn't change 
dynamically based on the planning or system state unlike shared buffers and 
parallel workers.  OTOH, I sometimes want to see what configuration parameter 
values the user set, such as work_mem, enable_*, and shared_buffers, together 
with the query plan (EXPLAIN and auto_explain).  For example, it'd be nice if 
EXPLAIN (parameters on) could do that.  Some relevant FDW-related parameters 
could be included in that output.

> 4) It seems that ExecInsert executes GetMaxBulkInsertTuples() over and
> over for every tuple. I don't know it that has measurable impact, but it
> seems a bit excessive IMO. I don't think we should support the batch
> size changing during execution (seems tricky).

Don't worry about this, too.  GetMaxBulkInsertTuples() just returns a value 
that was already saved in a struct in create_foreign_modify().


Regards
Takayuki Tsunakawa

Re: POC: postgres_fdw insert batching

2020-11-23 Thread Tomas Vondra

On 11/23/20 3:17 AM, tsunakawa.ta...@fujitsu.com wrote:
> From: Tomas Vondra 
>> I don't think this is usable in practice, because a single session
>> may be using multiple FDW servers, with different implementations,
>> latency to the data nodes, etc. It's unlikely a single GUC value
>> will be suitable for all of them.
> 
> That makes sense.  The row size varies from table to table, so the
> user may want to tune this option to reduce memory consumption.
> 
> I think the attached patch has reflected all your comments.  I hope
> this will pass..
> 

Thanks - I didn't have time for a thorough review at the moment, so I
only skimmed through the diff and did a couple  very simple tests. And I
think overall it looks quite nice.

A couple minor comments/questions:

1) We're calling it "batch_size" but the API function is named
postgresGetMaxBulkInsertTuples(). Perhaps we should rename the function
to postgresGetModifyBatchSize()? That has the advantage it'd work if we
ever add support for batching to UPDATE/DELETE.

2) Do we have to lookup the batch_size in create_foreign_modify (in
server/table options)? I'd have expected to look it up while planning
the modify and then pass it through the list, just like the other
FdwModifyPrivateIndex stuff. But maybe that's not possible.

3) That reminds me - should we show the batching info on EXPLAIN? That
seems like a fairly interesting thing to show to the user. Perhaps
showing the average batch size would also be useful? Or maybe not, we
create the batches as large as possible, with the last one smaller.

4) It seems that ExecInsert executes GetMaxBulkInsertTuples() over and
over for every tuple. I don't know it that has measurable impact, but it
seems a bit excessive IMO. I don't think we should support the batch
size changing during execution (seems tricky).

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

RE: POC: postgres_fdw insert batching

2020-11-22 Thread tsunakawa.ta...@fujitsu.com

From: Tomas Vondra 
> I don't think this is usable in practice, because a single session may
> be using multiple FDW servers, with different implementations, latency
> to the data nodes, etc. It's unlikely a single GUC value will be
> suitable for all of them.

That makes sense.  The row size varies from table to table, so the user may 
want to tune this option to reduce memory consumption.

I think the attached patch has reflected all your comments.  I hope this will 
pass..


Regards
Takayuki Tsunakawa





v4-0001-Add-bulk-insert-for-foreign-tables.patch
Description: v4-0001-Add-bulk-insert-for-foreign-tables.patch

Re: POC: postgres_fdw insert batching

2020-11-19 Thread Tomas Vondra

On 11/19/20 3:43 AM, tsunakawa.ta...@fujitsu.com wrote:
> From: Tomas Vondra 
>> Unfortunately, this does not compile for me, because
>> nodeModifyTable calls ExecGetTouchedPartitions, which is not
>> defined anywhere. Not sure what's that about, so I simply
>> commented-out this. That probably fails the partitioned cases, but
>> it allowed me to do some review and testing.
> 
> Ouch, sorry.  I'm ashamed to have forgotten including
> execPartition.c.
> 

No reason to feel ashamed. Mistakes do happen from time to time.

> 
>> The are a couple other smaller changes. E.g. it undoes changes to 
>> finish_foreign_modify, and instead calls separate functions to
>> prepare the bulk statement. It also adds list_make5/list_make6
>> macros, so as to not have to do strange stuff with the parameter
>> lists.
> 
> Thanks, I'll take them thankfully!  I wonder why I didn't think of
> separating deallocate_query() from finish_foreign_modify() ...
> perhaps my brain was dying.  As for list_make5/6(), I saw your first
> patch avoid adding them, so I thought you found them ugly (and I felt
> so, too.)  But thinking now, there's no reason to hesitate it.
> 

I think it's often easier to look changes like deallocate_query with a
bit of distance, not while hacking on the patch and just trying to make
it work somehow.

For the list_make# stuff, I think I've decided to do the simplest thing
possible in extension, without having to recompile the server. But I
think for a proper patch it's better to keep it more readable.

> ...
> 
>> 1) As I mentioned before, I really don't think we should be doing
>> deparsing in execute_foreign_modify - that's something that should
>> happen earlier, and should be in a deparse.c function.
> ...
>> The attached patch tries to address both of these points.
>> 
>> Firstly, it adds a new deparseBulkInsertSql function, that builds a
>> query for the "full" batch, and then uses those two queries - when
>> we get a full batch we use the bulk query, otherwise we use the
>> single-row query in a loop. IMO this is cleaner than deparsing
>> queries ad hoc in the execute_foreign_modify.
> ...
>> Of course, this might be worse when we don't have a full batch,
>> e.g. for a query that insert only 50 rows with batch_size=100. If
>> this case is common, one option would be lowering the batch_size
>> accordingly. If we really want to improve this case too, I suggest
>> we pass more info than just a position of the VALUES clause - that
>> seems a bit too hackish.
> ...
>> Secondly, it adds the batch_size option to server/foreign table,
>> and uses that. This is not complete, though.
>> postgresPlanForeignModify currently passes a hard-coded value at
>> the moment, it needs to lookup the correct value for the 
>> server/table from RelOptInfo or something. And I suppose
>> ModifyTable inftractructure will need to determine the value in
>> order to pass the correct number of slots to the FDW API.
> 
> I can sort of understand your feeling, but I'd like to reconstruct
> the query and prepare it in execute_foreign_modify() because:
> 
> * Some of our customers use bulk insert in ECPG (INSERT ...
> VALUES(record1, (record2), ...) to insert variable number of records
> per query.  (Oracle's Pro*C has such a feature.)  So, I want to be
> prepared to enable such a thing with FDW.
> 
> * The number of records to insert is not known during planning (in
> general), so it feels natural to get prepared during execution phase,
> or not unnatural at least.
> 

I think we should differentiate between "deparsing" and "preparing".

> * I wanted to avoid the overhead of building the full query string
> for 100-record insert statement during query planning, because it may
> be a bit costly for usual 1-record inserts.  (The overhead may be
> hidden behind the high communication cost of postgres_fdw, though.)
> 

Hmm, ok. I haven't tried how expensive that would be, but my assumption
was it's much cheaper than the latency we save. But maybe I'm wrong.

> So, in terms of code cleanness, how about moving my code for
> rebuilding query string from execute_foreign_modify() to some new
> function in deparse.c?
> 

That might work, yeah. I suggest we do this:

1) try to use the same approach for both single-row inserts and larger
batches, to not have a lot of different branches

2) modify deparseInsertSql to produce not the "final" query but some
intermediate representation useful to generate queries inserting
arbitrary number of rows

3) in execute_foreign_modify remember the last number of rows, and only
rebuild/replan the query when it changes

> 
>> 2) I think the GUC should be replaced with an server/table option,
>> similar to fetch_size.
> 
> Hmm, batch_size differs from fetch_size.  fetch_size is a
> postgres_fdw-specific feature with no relevant FDW routine, while
> batch_size is a configuration parameter for all FDWs that implement
> ExecForeignBulkInsert().  The ideas I can think of are:
> 
> 1. Follow JDBC/ODBC and add standard F

RE: POC: postgres_fdw insert batching

2020-11-18 Thread tsunakawa.ta...@fujitsu.com

From: Tomas Vondra 
> Unfortunately, this does not compile for me, because nodeModifyTable calls
> ExecGetTouchedPartitions, which is not defined anywhere. Not sure what's
> that about, so I simply commented-out this. That probably fails the 
> partitioned
> cases, but it allowed me to do some review and testing.

Ouch, sorry.  I'm ashamed to have forgotten including execPartition.c.


> The are a couple other smaller changes. E.g. it undoes changes to
> finish_foreign_modify, and instead calls separate functions to prepare the 
> bulk
> statement. It also adds list_make5/list_make6 macros, so as to not have to do
> strange stuff with the parameter lists.

Thanks, I'll take them thankfully!  I wonder why I didn't think of separating 
deallocate_query() from finish_foreign_modify() ... perhaps my brain was dying. 
 As for list_make5/6(), I saw your first patch avoid adding them, so I thought 
you found them ugly (and I felt so, too.)  But thinking now, there's no reason 
to hesitate it.


> A finally, this should probably add a bunch of regression tests.

Sure.


> 1) As I mentioned before, I really don't think we should be doing deparsing in
> execute_foreign_modify - that's something that should happen earlier, and
> should be in a deparse.c function.
...
> The attached patch tries to address both of these points.
> 
> Firstly, it adds a new deparseBulkInsertSql function, that builds a query for 
> the
> "full" batch, and then uses those two queries - when we get a full batch we 
> use
> the bulk query, otherwise we use the single-row query in a loop. IMO this is
> cleaner than deparsing queries ad hoc in the execute_foreign_modify.
...
> Of course, this might be worse when we don't have a full batch, e.g. for a 
> query
> that insert only 50 rows with batch_size=100. If this case is common, one
> option would be lowering the batch_size accordingly. If we really want to
> improve this case too, I suggest we pass more info than just a position of the
> VALUES clause - that seems a bit too hackish.
...
> Secondly, it adds the batch_size option to server/foreign table, and uses 
> that.
> This is not complete, though. postgresPlanForeignModify currently passes a
> hard-coded value at the moment, it needs to lookup the correct value for the
> server/table from RelOptInfo or something. And I suppose ModifyTable
> inftractructure will need to determine the value in order to pass the correct
> number of slots to the FDW API.

I can sort of understand your feeling, but I'd like to reconstruct the query 
and prepare it in execute_foreign_modify() because:

* Some of our customers use bulk insert in ECPG (INSERT ... VALUES(record1, 
(record2), ...) to insert variable number of records per query.  (Oracle's 
Pro*C has such a feature.)  So, I want to be prepared to enable such a thing 
with FDW.

* The number of records to insert is not known during planning (in general), so 
it feels natural to get prepared during execution phase, or not unnatural at 
least.

* I wanted to avoid the overhead of building the full query string for 
100-record insert statement during query planning, because it may be a bit 
costly for usual 1-record inserts.  (The overhead may be hidden behind the high 
communication cost of postgres_fdw, though.)

So, in terms of code cleanness, how about moving my code for rebuilding query 
string from execute_foreign_modify() to some new function in deparse.c?


> 2) I think the GUC should be replaced with an server/table option, similar to
> fetch_size.

Hmm, batch_size differs from fetch_size.  fetch_size is a postgres_fdw-specific 
feature with no relevant FDW routine, while batch_size is a configuration 
parameter for all FDWs that implement ExecForeignBulkInsert().  The ideas I can 
think of are:

1. Follow JDBC/ODBC and add standard FDW properties.  For example, the JDBC 
standard defines standard connection pool properties such as maxPoolSize and 
minPoolSize.  JDBC drivers have to provide them with those defined names.  
Likewise, the FDW interface requires FDW implementors to handle the foreign 
server option name "max_bulk_insert_tuples" if he/she wants to provide bulk 
insert feature and implement ExecForeignBulkInsert().  The core executor gets 
that setting from the FDW by calling a new FDW routine like 
GetMaxBulkInsertTuples().  Sigh...

2. Add a new max_bulk_insert_tuples reloption to CREATE/ALTER FOREIGN TABLE.  
executor gets the value from Relation and uses it.  (But is this a 
table-specific configuration?  I don't think so, sigh...)

3. Adopt the current USERSET GUC max_bulk_insert_tuples.  I think this is 
enough because the user can change the setting per session, application, and 
database.



Regards
Takayuki Tsunakawa

Re: POC: postgres_fdw insert batching

2020-11-18 Thread Tomas Vondra

On 11/17/20 10:11 AM, tsunakawa.ta...@fujitsu.com wrote:
> Hello,
> 
> 
> Modified the patch as I talked with Tomas-san.  The performance
> results of loading one million records into a hash-partitioned table
> with 8 partitions are as follows:
> 
> unpatched, local: 8.6 seconds unpatched, fdw: 113.7 seconds patched,
> fdw: 12.5 seconds  (9x improvement)
> 
> The test scripts are also attached.  Run prepare.sql once to set up
> tables and source data.  Run local_part.sql and fdw_part.sql to load
> source data into a partitioned table with local partitions and a
> partitioned table with foreign tables respectively.
> 

Unfortunately, this does not compile for me, because nodeModifyTable
calls ExecGetTouchedPartitions, which is not defined anywhere. Not sure
what's that about, so I simply commented-out this. That probably fails
the partitioned cases, but it allowed me to do some review and testing.

As for the patch, I have a couple of comments

1) As I mentioned before, I really don't think we should be doing
deparsing in execute_foreign_modify - that's something that should
happen earlier, and should be in a deparse.c function.

2) I think the GUC should be replaced with an server/table option,
similar to fetch_size.

The attached patch tries to address both of these points.

Firstly, it adds a new deparseBulkInsertSql function, that builds a
query for the "full" batch, and then uses those two queries - when we
get a full batch we use the bulk query, otherwise we use the single-row
query in a loop. IMO this is cleaner than deparsing queries ad hoc in
the execute_foreign_modify.

Of course, this might be worse when we don't have a full batch, e.g. for
a query that insert only 50 rows with batch_size=100. If this case is
common, one option would be lowering the batch_size accordingly. If we
really want to improve this case too, I suggest we pass more info than
just a position of the VALUES clause - that seems a bit too hackish.

Secondly, it adds the batch_size option to server/foreign table, and
uses that. This is not complete, though. postgresPlanForeignModify
currently passes a hard-coded value at the moment, it needs to lookup
the correct value for the server/table from RelOptInfo or something. And
I suppose ModifyTable inftractructure will need to determine the value
in order to pass the correct number of slots to the FDW API.

The are a couple other smaller changes. E.g. it undoes changes to
finish_foreign_modify, and instead calls separate functions to prepare
the bulk statement. It also adds list_make5/list_make6 macros, so as to
not have to do strange stuff with the parameter lists.

A finally, this should probably add a bunch of regression tests.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
>From 6a7031c800dff8fff9e1e64e0278494f3acd686f Mon Sep 17 00:00:00 2001
From: Takayuki Tsunakawa 
Date: Tue, 10 Nov 2020 09:27:56 +0900
Subject: [PATCH 1/3] Add bulk insert for foreign tables

---
 contrib/postgres_fdw/deparse.c|   3 +-
 contrib/postgres_fdw/postgres_fdw.c   | 233 ++
 contrib/postgres_fdw/postgres_fdw.h   |   2 +-
 doc/src/sgml/config.sgml  |  21 ++
 doc/src/sgml/fdwhandler.sgml  |  64 -
 src/backend/executor/nodeModifyTable.c| 151 
 src/backend/utils/misc/guc.c  |  12 +
 src/backend/utils/misc/postgresql.conf.sample |   1 +
 src/include/executor/nodeModifyTable.h|   2 +
 src/include/foreign/fdwapi.h  |   7 +
 src/include/nodes/execnodes.h |   5 +
 11 files changed, 446 insertions(+), 55 deletions(-)

diff --git a/contrib/postgres_fdw/deparse.c b/contrib/postgres_fdw/deparse.c
index 2d44df19fe..5aa81db08e 100644
--- a/contrib/postgres_fdw/deparse.c
+++ b/contrib/postgres_fdw/deparse.c
@@ -1706,7 +1706,7 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
  Index rtindex, Relation rel,
  List *targetAttrs, bool doNothing,
  List *withCheckOptionList, List *returningList,
- List **retrieved_attrs)
+ List **retrieved_attrs, int *values_end_len)
 {
 	AttrNumber	pindex;
 	bool		first;
@@ -1749,6 +1749,7 @@ deparseInsertSql(StringInfo buf, RangeTblEntry *rte,
 	}
 	else
 		appendStringInfoString(buf, " DEFAULT VALUES");
+	*values_end_len = buf->len;

 	if (doNothing)
 		appendStringInfoString(buf, " ON CONFLICT DO NOTHING");
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 9c5aaacc51..f7be4bec17 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -86,8 +86,10 @@ enum FdwScanPrivateIndex
  * 1) INSERT/UPDATE/DELETE statement text to be sent to the remote server
  * 2) Integer list of target attribute numbers for INSERT/UPDATE
  *	  (NIL for a DELETE)
- * 3) Boolean flag showing if the remote query has a RETURNING clause
- * 4) Integer list of attribute numbers retrieved

RE: POC: postgres_fdw insert batching

2020-11-17 Thread tsunakawa.ta...@fujitsu.com

Hello,


Modified the patch as I talked with Tomas-san.  The performance results of 
loading one million records into a hash-partitioned table with 8 partitions are 
as follows:

unpatched, local: 8.6 seconds
unpatched, fdw: 113.7 seconds
patched, fdw: 12.5 seconds  (9x improvement)

The test scripts are also attached.  Run prepare.sql once to set up tables and 
source data.  Run local_part.sql and fdw_part.sql to load source data into a 
partitioned table with local partitions and a partitioned table with foreign 
tables respectively.


Regards
Takayuki Tsunakawa



fdw.sql
Description: fdw.sql


fdw_part.sql
Description: fdw_part.sql


local.sql
Description: local.sql


local_part.sql
Description: local_part.sql


prepare.sql
Description: prepare.sql


v2-0001-Add-bulk-insert-for-foreign-tables.patch
Description: v2-0001-Add-bulk-insert-for-foreign-tables.patch

RE: POC: postgres_fdw insert batching

2020-11-11 Thread tsunakawa.ta...@fujitsu.com

From: t...@corona.is.ed.ac.uk  On Behalf Of
> Does this patch affect trigger semantics on the base table?
> 
> At the moment when I insert 1000 rows into a postgres_fdw table using a
> single insert statement (e.g. INSERT INTO fdw_foo SELECT ... FROM bar) I
> naively expect a "statement level" trigger on the base table to trigger
> once. But this is not the case. The postgres_fdw implements this
> operation as 1000 separate insert statements on the base table, so the
> trigger happens 1000 times instead of once. Hence there is no
> distinction between using a statement level and a row level trigger on
> the base table in this context.
> 
> So would this patch change the behaviour so only 10 separate insert
> statements (each of 100 rows) would be made against the base table?
> If so thats useful as it means improving performance using statement
> level triggers becomes possible. But it would also result in more
> obscure semantics and might break user processes dependent on the
> existing behaviour after the patch is applied.

Yes, the times the statement trigger defined on the base (remote) table will be 
reduced, as you said.


> BTW is this subtlety documented, I haven't found anything but happy
> to be proved wrong?

Unfortunately, there doesn't seem to be any description on triggers on base 
tables.  For example, if the local foreign table has an AFTER ROW trigger and 
its remote base table has a BEFORE ROW trigger that modifies the input record, 
it seems that the AFTER ROW trigger doesn't see the modified record.


Regards
Takayuki Tsunakawa

RE: POC: postgres_fdw insert batching

2020-11-11 Thread Tim . Colles


On Wed, 11 Nov 2020, tsunakawa.ta...@fujitsu.com wrote:


This email was sent to you by someone outside of the University.
You should only click on links or attachments if you are certain that the email 
is genuine and the content is safe.

From: Tomas Vondra 

I see the patch builds the "bulk" query in execute_foreign_modify. IMO
that's something we should do earlier, when we're building the simple
query (for 1-row inserts). I'd understand if you were concerned about
overhead in case of 1-row inserts, trying to not plan the bulk query
until necessary, but I'm not sure this actually helps.

Or was the goal to build a query for every possible number of slots? I
don't think that's really useful, considering it requires deallocating
the old plan, preparing a new one, etc. IMO it should be sufficient to
have two queries - one for 1-row inserts, one for the full batch. The
last incomplete batch can be inserted using a loop of 1-row queries.

That's what my patch was doing, but I'm not insisting on that - it just
seems like a better approach to me. So feel free to argue why this is
better.


Don't be concerned, the processing is not changed for 1-row inserts: the INSERT 
query string is built in PlanForeignModify(), and the remote statement is 
prepared in execute_foreign_modify() during the first call to 
ExecForeignInsert() and it's reused for subsequent ExecForeignInsert() calls.

The re-creation of INSERT query string and its corresponding PREPARE happen 
when the number of tuples to be inserted is different from the previous call to 
ExecForeignInsert()/ExecForeignBulkInsert().  That's because we don't know how 
many tuples will be inserted during planning (PlanForeignModify) or execution 
(until the scan ends for SELECT).  For example, if we insert 10,030 rows with 
the bulk size 100, the flow is:

 PlanForeignModify():
   build the INSERT query string for 1 row
 ExecForeignBulkInsert(100):
   drop the INSERT query string and prepared statement for 1 row
   build the query string and prepare statement for 100 row INSERT
   execute it
 ExecForeignBulkInsert(100):
   reuse the prepared statement for 100 row INSERT and execute it
...
 ExecForeignBulkInsert(30):
   drop the INSERT query string and prepared statement for 100 row
   build the query string and prepare statement for 30 row INSERT
   execute it



I think it'd be good to have such GUC, even if only for testing and
development. We should probably have a way to disable the batching,
which the GUC could also do, I think. So +1 to have the GUC.


OK, I'll add it.  The name would be max_bulk_insert_tuples, because a) it might cover bulk insert for local 
relations in the future, and b) "tuple" is used in cpu_(index_)tuple_cost and parallel_tuple_cost, 
while "row" or "record" is not used in GUC (except for row_security).

The valid range would be between 1 and 1,000 (I said 10,000 previously, but I 
think it's overreaction and am a bit worried about unforseen trouble too many 
tuples might cause.)  1 disables the bulk processing and uses the traditonal 
ExecForeignInsert().  The default value is 100 (would 1 be sensible as a 
default value to avoid surprising users by increased memory usage?)



2. Should we accumulate records per relation in ResultRelInfo
instead? That is, when inserting into a partitioned table that has
foreign partitions, delay insertion until a certain number of input
records accumulate, and then insert accumulated records per relation
(e.g., 50 records to relation A, 30 records to relation B, and 20
records to relation C.)  If we do that,



I think there's a chunk of text missing here? If we do that, then what?


Sorry, the two bullets below there are what follows.  Perhaps I should have written ":" 
instead of ",".



Anyway, I don't see why accumulating the records in ResultRelInfo would
be better than what the patch does now. It seems to me like fairly
specific to FDWs, so keeping it int FDW state seems appropriate. What
would be the advantage of stashing it in ResultRelInfo?


I thought of distributing input records to their corresponding partitions' 
ResultRelInfos.  For example, input record for partition 1 comes, store it in 
the ResultRelInfo for partition 1, then input record for partition 2 comes, 
store it in the ResultRelInfo for partition 2.  When a ResultRelInfo 
accumulates some number of rows, insert the accumulated rows therein into the 
partition.  When the input endds, perform bulk inserts for ResultRelInfos that 
have accumulated rows.




I think that's OK for most use cases, and if it's not (e.g. when there's
something requiring the exact order of writes) then it's not possible to
use batching. That's one of the reasons why I think we should have a GUC
to disable the batching.


Agreed.



* Should the maximum count of accumulated records be applied per
relation or the query? When many foreign partitions belong to a
partitioned table, if the former is chosen, it may use much memory in
total.  If the latter i

RE: POC: postgres_fdw insert batching

2020-11-10 Thread tsunakawa.ta...@fujitsu.com

From: Tomas Vondra 
> I see the patch builds the "bulk" query in execute_foreign_modify. IMO
> that's something we should do earlier, when we're building the simple
> query (for 1-row inserts). I'd understand if you were concerned about
> overhead in case of 1-row inserts, trying to not plan the bulk query
> until necessary, but I'm not sure this actually helps.
> 
> Or was the goal to build a query for every possible number of slots? I
> don't think that's really useful, considering it requires deallocating
> the old plan, preparing a new one, etc. IMO it should be sufficient to
> have two queries - one for 1-row inserts, one for the full batch. The
> last incomplete batch can be inserted using a loop of 1-row queries.
> 
> That's what my patch was doing, but I'm not insisting on that - it just
> seems like a better approach to me. So feel free to argue why this is
> better.

Don't be concerned, the processing is not changed for 1-row inserts: the INSERT 
query string is built in PlanForeignModify(), and the remote statement is 
prepared in execute_foreign_modify() during the first call to 
ExecForeignInsert() and it's reused for subsequent ExecForeignInsert() calls.

The re-creation of INSERT query string and its corresponding PREPARE happen 
when the number of tuples to be inserted is different from the previous call to 
ExecForeignInsert()/ExecForeignBulkInsert().  That's because we don't know how 
many tuples will be inserted during planning (PlanForeignModify) or execution 
(until the scan ends for SELECT).  For example, if we insert 10,030 rows with 
the bulk size 100, the flow is:

  PlanForeignModify():
build the INSERT query string for 1 row
  ExecForeignBulkInsert(100):
drop the INSERT query string and prepared statement for 1 row
build the query string and prepare statement for 100 row INSERT
execute it
  ExecForeignBulkInsert(100):
reuse the prepared statement for 100 row INSERT and execute it
...
  ExecForeignBulkInsert(30):
drop the INSERT query string and prepared statement for 100 row
build the query string and prepare statement for 30 row INSERT
execute it


> I think it'd be good to have such GUC, even if only for testing and
> development. We should probably have a way to disable the batching,
> which the GUC could also do, I think. So +1 to have the GUC.

OK, I'll add it.  The name would be max_bulk_insert_tuples, because a) it might 
cover bulk insert for local relations in the future, and b) "tuple" is used in 
cpu_(index_)tuple_cost and parallel_tuple_cost, while "row" or "record" is not 
used in GUC (except for row_security).

The valid range would be between 1 and 1,000 (I said 10,000 previously, but I 
think it's overreaction and am a bit worried about unforseen trouble too many 
tuples might cause.)  1 disables the bulk processing and uses the traditonal 
ExecForeignInsert().  The default value is 100 (would 1 be sensible as a 
default value to avoid surprising users by increased memory usage?)


> > 2. Should we accumulate records per relation in ResultRelInfo
> > instead? That is, when inserting into a partitioned table that has
> > foreign partitions, delay insertion until a certain number of input
> > records accumulate, and then insert accumulated records per relation
> > (e.g., 50 records to relation A, 30 records to relation B, and 20
> > records to relation C.)  If we do that,
> >
> 
> I think there's a chunk of text missing here? If we do that, then what?

Sorry, the two bullets below there are what follows.  Perhaps I should have 
written ":" instead of ",".


> Anyway, I don't see why accumulating the records in ResultRelInfo would
> be better than what the patch does now. It seems to me like fairly
> specific to FDWs, so keeping it int FDW state seems appropriate. What
> would be the advantage of stashing it in ResultRelInfo?

I thought of distributing input records to their corresponding partitions' 
ResultRelInfos.  For example, input record for partition 1 comes, store it in 
the ResultRelInfo for partition 1, then input record for partition 2 comes, 
store it in the ResultRelInfo for partition 2.  When a ResultRelInfo 
accumulates some number of rows, insert the accumulated rows therein into the 
partition.  When the input endds, perform bulk inserts for ResultRelInfos that 
have accumulated rows.



> I think that's OK for most use cases, and if it's not (e.g. when there's
> something requiring the exact order of writes) then it's not possible to
> use batching. That's one of the reasons why I think we should have a GUC
> to disable the batching.

Agreed.


> > * Should the maximum count of accumulated records be applied per
> > relation or the query? When many foreign partitions belong to a
> > partitioned table, if the former is chosen, it may use much memory in
> > total.  If the latter is chosen, the records per relation could be
> > few and thus the benefit of bulk insert could be small.
> >
> 
> I think it needs to be applie

Re: POC: postgres_fdw insert batching

2020-11-10 Thread Tomas Vondra




On 11/10/20 4:05 PM, Tomas Vondra wrote:
> Hi,
> 
> Thanks for working on this!
> 
> On 11/10/20 1:45 AM, tsunakawa.ta...@fujitsu.com wrote:
>> Hello,
>>
>>
>> The attached patch implements the new bulk insert routine for
>> postgres_fdw and the executor utilizing it.  It passes make
>> check-world.
>>
> 
> I haven't done any testing yet, just a quick review.
> 
> I see the patch builds the "bulk" query in execute_foreign_modify. IMO
> that's something we should do earlier, when we're building the simple
> query (for 1-row inserts). I'd understand if you were concerned about
> overhead in case of 1-row inserts, trying to not plan the bulk query
> until necessary, but I'm not sure this actually helps.
> 
> Or was the goal to build a query for every possible number of slots? I
> don't think that's really useful, considering it requires deallocating
> the old plan, preparing a new one, etc. IMO it should be sufficient to
> have two queries - one for 1-row inserts, one for the full batch. The
> last incomplete batch can be inserted using a loop of 1-row queries.
> 
> That's what my patch was doing, but I'm not insisting on that - it just
> seems like a better approach to me. So feel free to argue why this is
> better.
> 
> 
>> I measured performance in a basic non-partitioned case by modifying
>> Tomas-san's scripts.  They perform an INSERT SELECT statement that
>> copies one million records.  The table consists of two integer
>> columns, with a primary key on one of those them.  You can run the
>> attached prepare.sql to set up once.  local.sql inserts to the table
>> directly, while fdw.sql inserts through a foreign table.
>>
>> The performance results, the average time of 5 runs,  were as follows
>> on a Linux host where the average round-trip time of "ping localhost"
>> was 34 us:
>>
>> master, local: 6.1 seconds master, fdw: 125.3 seconds patched, fdw:
>> 11.1 seconds (11x improvement)
>>
> 
> Nice. I think we can't really get much closer to local master, so 6.1
> vs. 11.1 seconds look quite acceptable.
> 
>>
>> The patch accumulates at most 100 records in ModifyTableState before
>> inserting in bulk.  Also, when an input record is targeted for a
>> different relation (= partition) than that for already accumulated
>> records, insert the accumulated records and store the new record for
>> later insert.
>>
>> [Issues]
>>
>> 1. Do we want a GUC parameter, say, max_bulk_insert_records =
>> (integer), to control the number of records inserted at once? The
>> range of allowed values would be between 1 and 1,000.  1 disables
>> bulk insert. The possible reason of the need for this kind of
>> parameter would be to limit the amount of memory used for accumulated
>> records, which could be prohibitively large if each record is big.  I
>> don't think this is a must, but I think we can have it.
>>
> 
> I think it'd be good to have such GUC, even if only for testing and
> development. We should probably have a way to disable the batching,
> which the GUC could also do, I think. So +1 to have the GUC.
> 
>> 2. Should we accumulate records per relation in ResultRelInfo
>> instead? That is, when inserting into a partitioned table that has
>> foreign partitions, delay insertion until a certain number of input
>> records accumulate, and then insert accumulated records per relation
>> (e.g., 50 records to relation A, 30 records to relation B, and 20
>> records to relation C.)  If we do that,
>>
> 
> I think there's a chunk of text missing here? If we do that, then what?
> 
> Anyway, I don't see why accumulating the records in ResultRelInfo would
> be better than what the patch does now. It seems to me like fairly
> specific to FDWs, so keeping it int FDW state seems appropriate. What
> would be the advantage of stashing it in ResultRelInfo?
> 
>> * The order of insertion differs from the order of input records.  Is
>> it OK?
>>
> 
> I think that's OK for most use cases, and if it's not (e.g. when there's
> something requiring the exact order of writes) then it's not possible to
> use batching. That's one of the reasons why I think we should have a GUC
> to disable the batching.
> 
>> * Should the maximum count of accumulated records be applied per
>> relation or the query? When many foreign partitions belong to a
>> partitioned table, if the former is chosen, it may use much memory in
>> total.  If the latter is chosen, the records per relation could be
>> few and thus the benefit of bulk insert could be small.
>>
> 
> I think it needs to be applied per relation, because that's the level at
> which we can do it easily and consistently. The whole point is to send
> data in sufficiently large chunks to minimize the communication overhead
> (latency etc.), but if you enforce it "per query" that seems hard.
> 
> Imagine you're inserting data into a table with many partitions - how do
> you pick the number of rows to accumulate? The table may have 10 or 1000
> partitions, we may be inserting into all partitions or just a sma

Re: POC: postgres_fdw insert batching

2020-11-10 Thread Tomas Vondra

Hi,

Thanks for working on this!

On 11/10/20 1:45 AM, tsunakawa.ta...@fujitsu.com wrote:
> Hello,
> 
> 
> The attached patch implements the new bulk insert routine for
> postgres_fdw and the executor utilizing it.  It passes make
> check-world.
> 

I haven't done any testing yet, just a quick review.

I see the patch builds the "bulk" query in execute_foreign_modify. IMO
that's something we should do earlier, when we're building the simple
query (for 1-row inserts). I'd understand if you were concerned about
overhead in case of 1-row inserts, trying to not plan the bulk query
until necessary, but I'm not sure this actually helps.

Or was the goal to build a query for every possible number of slots? I
don't think that's really useful, considering it requires deallocating
the old plan, preparing a new one, etc. IMO it should be sufficient to
have two queries - one for 1-row inserts, one for the full batch. The
last incomplete batch can be inserted using a loop of 1-row queries.

That's what my patch was doing, but I'm not insisting on that - it just
seems like a better approach to me. So feel free to argue why this is
better.

> I measured performance in a basic non-partitioned case by modifying
> Tomas-san's scripts.  They perform an INSERT SELECT statement that
> copies one million records.  The table consists of two integer
> columns, with a primary key on one of those them.  You can run the
> attached prepare.sql to set up once.  local.sql inserts to the table
> directly, while fdw.sql inserts through a foreign table.
> 
> The performance results, the average time of 5 runs,  were as follows
> on a Linux host where the average round-trip time of "ping localhost"
> was 34 us:
> 
> master, local: 6.1 seconds master, fdw: 125.3 seconds patched, fdw:
> 11.1 seconds (11x improvement)
> 

Nice. I think we can't really get much closer to local master, so 6.1
vs. 11.1 seconds look quite acceptable.

> 
> The patch accumulates at most 100 records in ModifyTableState before
> inserting in bulk.  Also, when an input record is targeted for a
> different relation (= partition) than that for already accumulated
> records, insert the accumulated records and store the new record for
> later insert.
> 
> [Issues]
> 
> 1. Do we want a GUC parameter, say, max_bulk_insert_records =
> (integer), to control the number of records inserted at once? The
> range of allowed values would be between 1 and 1,000.  1 disables
> bulk insert. The possible reason of the need for this kind of
> parameter would be to limit the amount of memory used for accumulated
> records, which could be prohibitively large if each record is big.  I
> don't think this is a must, but I think we can have it.
> 

I think it'd be good to have such GUC, even if only for testing and
development. We should probably have a way to disable the batching,
which the GUC could also do, I think. So +1 to have the GUC.

> 2. Should we accumulate records per relation in ResultRelInfo
> instead? That is, when inserting into a partitioned table that has
> foreign partitions, delay insertion until a certain number of input
> records accumulate, and then insert accumulated records per relation
> (e.g., 50 records to relation A, 30 records to relation B, and 20
> records to relation C.)  If we do that,
> 

I think there's a chunk of text missing here? If we do that, then what?

Anyway, I don't see why accumulating the records in ResultRelInfo would
be better than what the patch does now. It seems to me like fairly
specific to FDWs, so keeping it int FDW state seems appropriate. What
would be the advantage of stashing it in ResultRelInfo?

> * The order of insertion differs from the order of input records.  Is
> it OK?
> 

I think that's OK for most use cases, and if it's not (e.g. when there's
something requiring the exact order of writes) then it's not possible to
use batching. That's one of the reasons why I think we should have a GUC
to disable the batching.

> * Should the maximum count of accumulated records be applied per
> relation or the query? When many foreign partitions belong to a
> partitioned table, if the former is chosen, it may use much memory in
> total.  If the latter is chosen, the records per relation could be
> few and thus the benefit of bulk insert could be small.
> 

I think it needs to be applied per relation, because that's the level at
which we can do it easily and consistently. The whole point is to send
data in sufficiently large chunks to minimize the communication overhead
(latency etc.), but if you enforce it "per query" that seems hard.

Imagine you're inserting data into a table with many partitions - how do
you pick the number of rows to accumulate? The table may have 10 or 1000
partitions, we may be inserting into all partitions or just a small
subset, not all partitions may be foreign, etc. It seems pretty
difficult to pick and enforce a reliable limit at the query level. But
maybe I'm missing something and it's easier than I think?

Of

RE: POC: postgres_fdw insert batching

2020-11-09 Thread tsunakawa.ta...@fujitsu.com

Hello,


The attached patch implements the new bulk insert routine for postgres_fdw and 
the executor utilizing it.  It passes make check-world.

I measured performance in a basic non-partitioned case by modifying Tomas-san's 
scripts.  They perform an INSERT SELECT statement that copies one million 
records.  The table consists of two integer columns, with a primary key on one 
of those them.  You can run the attached prepare.sql to set up once.  local.sql 
inserts to the table directly, while fdw.sql inserts through a foreign table.

The performance results, the average time of 5 runs,  were as follows on a 
Linux host where the average round-trip time of "ping localhost" was 34 us:

master, local: 6.1 seconds
master, fdw: 125.3 seconds
patched, fdw: 11.1 seconds (11x improvement)


The patch accumulates at most 100 records in ModifyTableState before inserting 
in bulk.  Also, when an input record is targeted for a different relation (= 
partition) than that for already accumulated records, insert the accumulated 
records and store the new record for later insert.

[Issues]

1. Do we want a GUC parameter, say, max_bulk_insert_records = (integer), to 
control the number of records inserted at once?
The range of allowed values would be between 1 and 1,000.  1 disables bulk 
insert.
The possible reason of the need for this kind of parameter would be to limit 
the amount of memory used for accumulated records, which could be prohibitively 
large if each record is big.  I don't think this is a must, but I think we can 
have it.

2. Should we accumulate records per relation in ResultRelInfo instead?
That is, when inserting into a partitioned table that has foreign partitions, 
delay insertion until a certain number of input records accumulate, and then 
insert accumulated records per relation (e.g., 50 records to relation A, 30 
records to relation B, and 20 records to relation C.)  If we do that,

* The order of insertion differs from the order of input records.  Is it OK?

* Should the maximum count of accumulated records be applied per relation or 
the query?
When many foreign partitions belong to a partitioned table, if the former is 
chosen, it may use much memory in total.  If the latter is chosen, the records 
per relation could be few and thus the benefit of bulk insert could be small.


Regards
Takayuki Tsunakawa



fdw.sql
Description: fdw.sql


local.sql
Description: local.sql


prepare.sql
Description: prepare.sql


v1-0001-Add-bulk-insert-for-foreign-tables.patch
Description: v1-0001-Add-bulk-insert-for-foreign-tables.patch

RE: POC: postgres_fdw insert batching

2020-10-08 Thread tsunakawa.ta...@fujitsu.com

From: Tomas Vondra 
> I'm not sure when I'll have time to work on this again, so if you are
> interested and willing to work on it, please go ahead. I'll gladly do
> reviews and help you with it.

Thank you very much.


> I think transferring data to other databases is fine - interoperability
> is a big advantage for users, I don't see it as something threatening
> the PostgreSQL project. I doubt this would make it more likely for users
> to migrate from PostgreSQL - there are many ways to do that already.

Definitely true.  Users may want to use INSERT SELECT to do some data 
transformation in their OLTP database and load it into a non-Postgres data 
warehouse.


> Yeah. I think handling complete failure should not be very difficult,
> but there are cases that worry me more. For example, what if there's a
> before trigger (on the remote db) that "skips" inserting some of the
> rows by returning NULL?

> Yeah. I wonder if the FDW needs to indicate which features are supported
> by the ExecForeignMultiInsert, e.g. by adding a function that decides
> whether batch insert is supported (it might also do that internally by
> calling ExecForeignInsert, of course).

Thanks for your advice.  I'll try to address them.


 Regards
Takayuki Tsunakawa

1 2 >

1 - 100 of 114 matches

Mail list logo