On 12 October 2016 at 19:51, Shay Rojansky <r...@roji.org> wrote: > Hi all. I thought I'd share some experience from Npgsql regarding > batching/pipelining - hope this isn't off-topic.
Not at all. > Npgsql has supported batching for quite a while, similar to what this patch > proposes - with a single Sync message is sent at the end. PgJDBC does too. The benefits of it there are what prompted me to do this, not only so libpq users can use it directly but so psqlODBC, psycopg2, etc can benefit from it if they choose to expose it. > It has recently come to my attention that this implementation is problematic > because it forces the batch to occur within a transaction; in other words, > there's no option for a non-transactional batch. That's not strictly the case. If you explicitly BEGIN and COMMIT, those operations are honoured within the batch. What's missing is implicit transactions. Usually if you send a series of messages they will each get their own implicit transaction. If you batch them, the whole lot gets an implicit transaction. This is because the PostgreSQL v3 protocol conflates transaction delineation with protocol error recovery into a single Sync message type. It's very similar to the behaviour of multi-statements, where: psql -c "CREATE TABLE fred(x integer); DROP TABLE notexists;" doesn't create "fred", but psql -c "BEGIN; CREATE TABLE fred(x integer); COMMIT; BEGIN; DROP TABLE notexists; COMMMIT;" ... does. And in fact it's for almost the same reason. They're sent as a single SimpleQuery message by psql and split up client side, but the effect is the same as two separate Query messages followed by a Sync. It isn't simple to manage this client-side, because libpq doesn't know whether a given command string may contain transaction delimiting statements or not and can't reliably look for them without client-side parsing of the SQL. So it can't dispatch its own BEGIN/COMMIT around statements in a batch that it thinks might be intended to be autocommit, and anyway that'd result in sending 3 queries for every 1 client query, which would suck. If the mythical v4 protocol ever happens I'd want to split Sync into two messages, one which is a protocol synchronisation message and another that is a transaction delimiter. Or give it flags or whatever. In the mean time: > This can be a problem for > several reasons: users may want to sent off a batch of inserts, not caring > whether one of them fails (e.g. because of a unique constraint violation). > In other words, in some scenarios it may be appropriate for later batched > statements to be executed when an earlier batched statement raised an error. > If Sync is only sent at the very end, this isn't possible. Right, and that remains the case even with explicit transaction delineation, because the first ERROR causes processing of all subsequent messages to be skipped. The design I have in libpq allows for this by allowing the client to delimit batches without ending batch mode, concurrently consuming a stream of multiple batches. Each endbatch is a Sync. So a client that wants autocommit-like behavour can send a series of 1-query batches. I think I'll need to document this a bit better since it's more subtle than I properly explained. > Another example > of a problem (which actually happened) is that transactions acquire > row-level locks, and so may trigger deadlocks if two different batches > update the same rows in reverse order. Both of these issues wouldn't occur > if the batch weren't implicitly batched. Same solution as above. > My current plan is to modify the batch implementation based on whether we're > in an (explicit) transaction or not. If we're in a transaction, then it > makes perfect sense to send a single Sync at the end as is being proposed > here - any failure would cause the transaction to fail anyway, so skipping > all subsequent statements until the batch's end makes sense. However, if > we're not in an explicit transaction, I plan to insert a Sync message after > each individual Execute, making non-transactional batched statements more or > less identical in behavior to non-transactional unbatched statements. Note > that this mean that a batch can generate multiple errors, not just one. Yes, that's what I suggest, and basically what the libpq batch interface does, though it expects the client to deal with the transaction boundaries. You will need to think hard about transaction boundaries as they relate to multi-statements unless nPgSQL parses out each statement from multi-statement strings like PgJDBC does. Otherwise a user can sneak in: somestatement; BEGIN; someotherstatement; or somestatement; CoMMiT; otherstatement; -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers