Re: postgres_fdw: Use COPY to speed up batch inserts

Masahiko Sawada Wed, 25 Feb 2026 17:41:06 -0800

On Thu, Jan 29, 2026 at 6:02 AM Matheus Alcantara
<[email protected]> wrote:
>
> On Tue Jan 27, 2026 at 4:17 PM -03, Masahiko Sawada wrote:
> > On Fri, Jan 2, 2026 at 12:33 PM Matheus Alcantara
> > <[email protected]> wrote:
> >>
> >> On Fri Jan 2, 2026 at 5:15 PM -03, Masahiko Sawada wrote:
> >> > +
> >> > +   /*
> >> > +    * Set batch_with_copy_threshold from foreign server/table options. 
> >> > We do
> >> > +    * this outside of create_foreign_modify() because we only want to 
> >> > use
> >> > +    * COPY as a remote SQL when a COPY FROM on a foreign table is 
> >> > executed or
> >> > +    * an insert is being performed on a table partition. In both cases 
> >> > the
> >> > +    * BeginForeignInsert fdw routine is called.
> >> > +    */
> >> > +   fmstate->batch_with_copy_threshold = 
> >> > get_batch_with_copy_threshold(rel);
> >> >
> >> > Does it mean that we could end up using the COPY method not only when
> >> > executing COPY FROM but also when executing INSERT with tuple
> >> > routings? If so, how does the EXPLAIN command show the remote SQL?
> >> >
> >> It meas that we could also use the COPY method to insert rows into a
> >> specific table partition that is a foreign table.
> >>
> >> Let's say that an user execute an INSERT INTO on a partitioned table
> >> that has partitions that are postgres_fdw tables, with this patch we
> >> could use the COPY method to insert the rows on these partitions. On
> >> this scenario we would not have issue with EXPLAIN output because
> >> currently we do not show the remote SQL being executed on each partition
> >> that is involved on the INSERT statement.
> >>
> >> If an user execute an INSERT directly into a postgres_fdw table we will
> >> use the normal INSERT statement as we use today.
> >
> > I'm slightly concerned that it could be confusing for users if we use
> > the COPY method for the same table based on not only
> > batch_with_copy_threshold but also how to INSERT. For example, if we
> > insert tuples directly to a leaf partition, we always use INSERT. On
> > the other hand, if we insert tuples via its parent table, we would use
> > either COPY or INSERT based on the number of tuples and
> > batch_with_copy_threshold value. IIUC this behavior stems from FDW API
> > design (BeginForeignInsert callback is called only in cases of COPY or
> > tuple routing), which users would not be aware of in general. Also,
> > inserting tuples directly to a leaf partition is faster in general
> > than doing via the parent table, but the COPY method optimization is
> > available only in the latter case.
>
> Yeah, I agree that this patch ends up in a land that it could introduce
> more confusing than improvements for the user.
>
> > How about making use of COPY method only when users execute a COPY
> > command? Which seems more intuitive and a good start. We can
> > distinguish BeginForeignInsert called via COPY from called via INSERT
> > (tuple routing) by adding a flag to ModifyTableState or by checking if
> > the passed resultRelInfo == resultRelInfo->ri_RootResultRelInfo.
>
> This sounds a good idea, it simplify the patch scope a lot. During my
> tests I've noticed that ri_RootResultRelInfo is null when it's being
> called by CopyFrom(), so on postgresBeginForeignInsert I've included a
> check that if it's null it means that it's being executed by a COPY
> command and then we could use the COPY command as remote SQL.


Thank you for updating the patch!

> Note that using COPY as the remote SQL is not always feasible. If the
> remote table has a trigger that modifies the row, and the local foreign
> table also has an insert trigger, we cannot capture those changes. While
> postgres_fdw typically relies on INSERT ... RETURNING * to synchronize
> the TupleTableSlot with remote side effects, the COPY command does not
> support a RETURNING clause. Without this synchronization, local triggers
> would see the original data rather than the actual values inserted. This
> limitation is why the ri_TrigDesc == NULL check is necessary; removing
> it causes the "Test a combination of local and remote triggers"
> regression test on postgres_fdw.sql to fail.

Agreed. If this problem happens only when the local table has an AFTER
INSERT trigger, can we check ri_TrigDesc->trig_insert_after_row too?

Regarding the third condition, resultRelInfo->ri_returningList == NIL,
can we make it an Assert() because checking
resultRelInfo->RootResultRelInfo == NULL already checks if it's called
via COPY?

One thing it might be worth considering is to add some regression
tests verifying that COPY commands are actually being used on the
remote server in success cases. That way, we can be aware of changes
even if we change the assumption in the future that RootResultRelInfo
== NULL only when postgresBeginForeignInsert() is called via COPY. One
idea is to define a trigger on the remote server that checks if the
executed query is INSERT or COPY. For example,

create function insert_or_copy() returns trigger as $$
declare query text;
begin
    query := current_query();
    if query ~* '^COPY' then
        raise notice 'COPY command';
    elsif query ~* '^INSERT' then
        raise notice 'INSERT command';
    end if;
return new;
end;
$$ language plpgsql;

Note that we need to set client_min_message to 'log' so that we can
write the notice message raised via postgres_fdw.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Re: postgres_fdw: Use COPY to speed up batch inserts

Reply via email to