Hi! Currently _bt_search() always locks leaf buffer in shared mode (aka BT_READ), while caller can relock it later. However, I don't see what prevents _bt_search() from locking leaf immediately in exclusive mode (aka BT_WRITE) when required. When traversing downlink from non-leaf page of level 1 (lowest level of non-leaf pages just prior to leaf pages), we know that next page is going to be leaf. In this case, we can immediately lock next page in exclusive mode if required. For sure, it might happen that we didn't manage to exclusively lock leaf in this way when _bt_getroot() points us to leaf page. But this case could be handled in _bt_search() by relock. Please, find implementation of this approach in the attached patch.
I've run following simple test of this patch on 72-cores machine. # postgresql.conf max_connections = 300 shared_buffers = 32GB fsync = off synchronous_commit = off # DDL: CREATE UNLOGGED TABLE ordered (id serial primary key, value text not null); CREATE UNLOGGED TABLE unordered (i integer not null, value text not null); # script_ordered.sql INSERT INTO ordered (value) VALUES ('abcdefghijklmnoprsqtuvwxyz'); # script_unordered.sql \set i random(1, 1000000) INSERT INTO unordered VALUES (:i, 'abcdefghijklmnoprsqtuvwxyz'); # commands pgbench -T 60 -P 1 -M prepared -f script_ordered.sql -c 150 -j 150 postgres pgbench -T 60 -P 1 -M prepared -f script_unordered.sql -c 150 -j 150 postgres # results ordered, master: 157473 TPS ordered, patched 231374 TPS unordered, master: 232372 TPS unordered, patched: 232535 TPS As you can see, difference in unordered case is negligible Due to random inserts, concurrency for particular leafs is low. But ordered insertion is almost 50% faster on patched version. I wonder how could we miss such a simple optimization till now, but I also don't see this patch to brake anything. In patched version, it might appear that we have to traverse rightlinks in exclusive mode due to splits concurrent to downlink traversal. However, the same might happen in current master due to splits concurrent to relocks. So, I don't expect performance regression to be caused by this patch. ------ Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
btree-search-write-lock-1.patch
Description: Binary data