On Tue, Jul 25, 2017 at 3:02 PM, Peter Geoghegan <p...@bowt.ie> wrote: > I've been thinking about this a lot, because this really does look > like a pathological case to me. I think that this workload is very > sensitive to how effective kill_prior_tuples/LP_DEAD hinting is. Or at > least, I can imagine that mechanism doing a lot better than it > actually manages to do here. I wonder if it's possible that commit > 2ed5b87f9, which let MVCC snapshots not hold on to a pin on leaf > pages, should have considered workloads like this.
While the benchmark Alik came up with is non-trivial to reproduce, I can show a consistent regression for a simple case with only one active backend. I'm not sure whether or not this is considered an acceptable trade-off -- I didn't look through the archives from around March of 2015 just yet. Setup: Initialize pgbench (any scale factor). create index on pgbench_accounts (aid); The point of this example is to show a simple query that is never quite HOT-safe, where we need to create a new index entry in an index for that reason alone. Theoretically, only one index needs to be updated, not all indexes, because only one index covers attributes that have truly changed. For example, if we had WARM, I think that many of these theoretically unnecessary index tuple insertions would not happen. When they do happen, because we don't have something like WARM, it seems important that the kill_prior_tuples/LP_DEAD stuff does its job. I don't think that it will consistently do that, though. Steps: (Set debugger breakpoint from within _bt_killitems()) Test queries (run these from a single interactive psql session): update pgbench_accounts set abalance = abalance + 1 where aid = 763; update pgbench_accounts set abalance = abalance + 1 where aid = 763; update pgbench_accounts set abalance = abalance + 1 where aid = 763; We now see that no update ever kills items within _bt_killitems(), because our own update to the index leaf page itself nullifies our ability to kill anything, by changing the page LSN from the one stashed in the index scan state variable. Fortunately, we are not really "self-blocking" dead item cleanup here, because the _bt_check_unique() logic for killing all_dead index entries saves the day by not caring about LSNs. However, that only happens because the index on "aid" is a unique index. If we drop the primary key, and create a regular index on "aid" in its place, then the same UPDATE queries really do self-block from killing index tuples due to changes in 2ed5b87f9, which could be pretty bad if there wasn't some selects to do the kill_prior_tuple stuff instead. I verified that this regression against 9.4 exists, just to be sure that the problem wasn't somehow there all along -- the regression is real. :-( -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers