RE: POC: postgres_fdw insert batching

tsunakawa.ta...@fujitsu.com Tue, 24 Nov 2020 00:46:07 -0800

From: Tomas Vondra <tomas.von...@enterprisedb.com>
> 1) We're calling it "batch_size" but the API function is named
> postgresGetMaxBulkInsertTuples(). Perhaps we should rename the function
> to postgresGetModifyBatchSize()? That has the advantage it'd work if we
> ever add support for batching to UPDATE/DELETE.


Actually, I was in two minds whether the term batch or bulk is better.  Because 
Oracle uses "bulk insert" and "bulk fetch", like in FETCH cur BULK COLLECT INTO 
array and FORALL in array INSERT INTO, while JDBC uses batch as in "batch 
updates" and its API method names (addBatch, executeBatch).

But it seems better or common to use batch according to the etymology and the 
following Stack Overflow page:

https://english.stackexchange.com/questions/141884/which-is-a-better-and-commonly-used-word-bulk-or-batch

OTOH, as for the name GetModifyBatchSize() you suggest, I think 
GetInsertBatchSize may be better.  That is, this API deals with multiple 
records in a single INSERT statement.  Your GetModifyBatchSize will be reserved 
for statement batching when libpq has supported batch/pipelining to execute 
multiple INSERT/UPDATE/DELETE statements, as in the following JDBC batch 
updates.  What do you think?

CODE EXAMPLE 14-1 Creating and executing a batch of insert statements 
--------------------------------------------------
Statement stmt = con.createStatement(); 
stmt.addBatch("INSERT INTO employees VALUES (1000, 'Joe Jones')"); 
stmt.addBatch("INSERT INTO departments VALUES (260, 'Shoe')"); 
stmt.addBatch("INSERT INTO emp_dept VALUES (1000, 260)"); 

// submit a batch of update commands for execution 
int[] updateCounts = stmt.executeBatch(); 
--------------------------------------------------


> 2) Do we have to lookup the batch_size in create_foreign_modify (in
> server/table options)? I'd have expected to look it up while planning
> the modify and then pass it through the list, just like the other
> FdwModifyPrivateIndex stuff. But maybe that's not possible.

Don't worry, create_foreign_modify() is called from PlanForeignModify() during 
planning.  Unfortunately, it's also called from BeginForeignInsert(), but other 
stuff passed to create_foreign_modify() including the query string is 
constructed there.


> 3) That reminds me - should we show the batching info on EXPLAIN? That
> seems like a fairly interesting thing to show to the user. Perhaps
> showing the average batch size would also be useful? Or maybe not, we
> create the batches as large as possible, with the last one smaller.

Hmm, maybe batch_size is not for EXPLAIN because its value doesn't change 
dynamically based on the planning or system state unlike shared buffers and 
parallel workers.  OTOH, I sometimes want to see what configuration parameter 
values the user set, such as work_mem, enable_*, and shared_buffers, together 
with the query plan (EXPLAIN and auto_explain).  For example, it'd be nice if 
EXPLAIN (parameters on) could do that.  Some relevant FDW-related parameters 
could be included in that output.

> 4) It seems that ExecInsert executes GetMaxBulkInsertTuples() over and
> over for every tuple. I don't know it that has measurable impact, but it
> seems a bit excessive IMO. I don't think we should support the batch
> size changing during execution (seems tricky).

Don't worry about this, too.  GetMaxBulkInsertTuples() just returns a value 
that was already saved in a struct in create_foreign_modify().


Regards
Takayuki Tsunakawa

RE: POC: postgres_fdw insert batching

Reply via email to