Over on [1] I'm proposing to delay locking partitions of a partitioned table that's the target of an INSERT or UPDATE command until we first route a tuple to the partition. Currently, we go and lock all partitions, even if we just insert a single tuple to a single partition. The patch in [1] improves the situation when there are many partitions and only a few tuples to route to just to a few partitions.
Over here and along similar lines to the above, but this time I'd like to take this even further and change things so we don't lock *any* partitions during AcquireExecutorLocks() and instead just lock them when we first access them with ExecGetRangeTableRelation(). This improves the situation when many partitions get run-time pruned, as we'll never bother locking those at all since we'll never call ExecGetRangeTableRelation() on them. We'll of course still lock the partitioned table so that plan invalidation works correctly. This does make the locking order less well defined, but I'm already proposing similar in [1] and over there I've mentioned that I can't quite see any huge issues with doing that. We already don't lock all partitions inside AcquireExecutorLocks() during INSERT with VALUES anyway. The attached patch implements this delayed locking. A quick benchmark shows a pretty good performance improvement when there are a large number of partitions and run-time pruning prunes almost all of them. Setup: create table hashp (a int) partition by hash(a); select 'create table hashp'|| x::text || ' partition of hashp for values with (modulus 10000, remainder ' || x ::text || ');' from generate_Series(0,9999) x; \gexec insert into hashp select generate_Series(1,1000000) bench.sql: \set p_a 13315 select * from hashp where a = :p_a; Master: 10000 parts $ pgbench -n -f bench.sql -M prepared -T 60 postgres tps = 108.882749 (excluding connections establishing) tps = 108.245437 (excluding connections establishing) delaylock: 10000 parts $ pgbench -n -f bench.sql -M prepared -T 60 postgres tps = 1068.289505 (excluding connections establishing) tps = 1092.797015 (excluding connections establishing) More could be done to make this quite a bit faster again, but that mostly requires the range table coming directly from the planner as an array and marking which array elements require locking with a Bitmapset. This'll save having to loop over the entire large array that mostly does not need anything locked. Similar can be done for ExecCheckRTPerms(), but that's also for another day. With those changes and some further tweaks done on the Append/MergeAppend code, tps is about 22k on my machine, which is just slightly slower than with an equivalent non-partitioned table. I'll add the attached patch to the January commitfest [1] https://www.postgresql.org/message-id/CAKJS1f-%3DFnMqmQP6qitkD%2BxEddxw22ySLP-0xFk3JAqUX2yfMw%40mail.gmail.com -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
v1-0001-Allow-lock-acquisitions-for-partitions-to-be-dela.patch
Description: Binary data