Re: POC: postgres_fdw insert batching

David Fetter Mon, 30 Nov 2020 00:29:26 -0800

On Wed, Nov 25, 2020 at 05:04:36AM +0000, [email protected] wrote:
> From: Tomas Vondra <[email protected]>
> > On 11/24/20 9:45 AM, [email protected] wrote:
> > > OTOH, as for the name GetModifyBatchSize() you suggest, I think
> > GetInsertBatchSize may be better.  That is, this API deals with multiple
> > records in a single INSERT statement.  Your GetModifyBatchSize will be
> > reserved for statement batching when libpq has supported batch/pipelining to
> > execute multiple INSERT/UPDATE/DELETE statements, as in the following
> > JDBC batch updates.  What do you think?
> > >
> > 
> > I don't know. I was really only thinking about batching in the context
> > of a single DML command, not about batching of multiple commands at the
> > protocol level. IMHO it's far more likely we'll add support for batching
> > for DELETE/UPDATE than libpq pipelining, which seems rather different
> > from how the FDW API works. Which is why I was suggesting to use a name
> > that would work for all DML commands, not just for inserts.
> 
> Right, I can't imagine now how the interaction among the client, server core 
> and FDWs would be regarding the statement batching.  So I'll take your 
> suggested name.
> 
> 
> > Not sure, but I'd guess knowing whether batching is used would be
> > useful. We only print the single-row SQL query, which kinda gives the
> > impression that there's no batching.
> 
> Added in postgres_fdw like "Remote SQL" when EXPLAIN VERBOSE is run.
> 
> 
> > > Don't worry about this, too.  GetMaxBulkInsertTuples() just returns a 
> > > value
> > that was already saved in a struct in create_foreign_modify().
> > >
> > 
> > Well, I do worry for two reasons.
> > 
> > Firstly, the fact that in postgres_fdw the call is cheap does not mean
> > it'll be like that in every other FDW. Presumably, the other FDWs might
> > cache it in the struct and do the same thing, of course.
> > 
> > But the fact that we're calling it over and over for each row kinda
> > seems like we allow the value to change during execution, but I very
> > much doubt the code is expecting that. I haven't tried, but assume the
> > function first returns 10 and then 100. ISTM the code will allocate
> > ri_Slots with 25 slots, but then we'll try stashing 100 tuples there.
> > That can't end well. Sure, we can claim it's a bug in the FDW extension,
> > but it's also due to the API design.
> 
> You worried about other FDWs than postgres_fdw.  That's reasonable.  I 
> insisted in other threads that PG developers care only about postgres_fdw, 
> not other FDWs, when designing the FDW interface, but I myself made the same 
> mistake.  I made changes so that the executor calls GetModifyBatchSize() once 
> per relation per statement.


Please pardon me for barging in late in this discussion, but if we're
going to be using a bulk API here, wouldn't it make more sense to use
COPY, except where RETURNING is specified, in place of INSERT?

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: POC: postgres_fdw insert batching

Reply via email to