Hi all. I thought I'd share some experience from Npgsql regarding
batching/pipelining - hope this isn't off-topic.

Npgsql has supported batching for quite a while, similar to what this patch
proposes - with a single Sync message is sent at the end.

It has recently come to my attention that this implementation is
problematic because it forces the batch to occur within a transaction; in
other words, there's no option for a non-transactional batch. This can be a
problem for several reasons: users may want to sent off a batch of inserts,
not caring whether one of them fails (e.g. because of a unique constraint
violation). In other words, in some scenarios it may be appropriate for
later batched statements to be executed when an earlier batched statement
raised an error. If Sync is only sent at the very end, this isn't possible.
Another example of a problem (which actually happened) is that transactions
acquire row-level locks, and so may trigger deadlocks if two different
batches update the same rows in reverse order. Both of these issues
wouldn't occur if the batch weren't implicitly batched.

My current plan is to modify the batch implementation based on whether
we're in an (explicit) transaction or not. If we're in a transaction, then
it makes perfect sense to send a single Sync at the end as is being
proposed here - any failure would cause the transaction to fail anyway, so
skipping all subsequent statements until the batch's end makes sense.
However, if we're not in an explicit transaction, I plan to insert a Sync
message after each individual Execute, making non-transactional batched
statements more or less identical in behavior to non-transactional
unbatched statements. Note that this mean that a batch can generate
multiple errors, not just one.

I'm sharing this since it may be relevant to the libpq batching
implementation as well, and also to get any feedback regarding how Npgsql
should act.

Reply via email to