Hi,
Scenario is a very plain upsert:
CREATE TABLE upsert(key int primary key);
INSERT INTO upsert VALUES(1) ON CONFLICT (key) DO UPDATE SET key = excluded.key;
INSERT INTO upsert VALUES(1) ON CONFLICT (key) DO UPDATE SET key = excluded.key;
INSERT 0 1
INSERT 0 1
postgres[8755][1]=# SELECT page_items.* FROM generate_series(0,
pg_relation_size('upsert'::regclass::text) / 8192 - 1) blkno,
get_raw_page('upsert'::regclass::text, blkno::int4) AS raw_page,
heap_page_items(raw_page) as page_items;
┌┬┬──┬┬──┬──┬──┬┬─┬┬┬┬┬┐
│ lp │ lp_off │ lp_flags │ lp_len │ t_xmin │ t_xmax │ t_field3 │ t_ctid │
t_infomask2 │ t_infomask │ t_hoff │ t_bits │ t_oid │ t_data │
├┼┼──┼┼──┼──┼──┼┼─┼┼┼┼┼┤
│ 1 │ 8160 │1 │ 28 │ 19742591 │ 19742592 │0 │ (0,2) │
24577 │256 │ 24 │ (null) │ (null) │ \x0100 │
│ 2 │ 8128 │1 │ 28 │ 19742592 │ 19742592 │0 │ (0,2) │
32769 │ 8336 │ 24 │ (null) │ (null) │ \x0100 │
└┴┴──┴┴──┴──┴──┴┴─┴┴┴┴┴┘
(2 rows)
as you can see the same xmax is set for both row version, with the new
infomask being HEAP_XMAX_KEYSHR_LOCK | HEAP_XMAX_LOCK_ONLY | HEAP_UPDATED.
The reason that happens is that ExecOnConflictUpdate() needs to lock the
row to be able to reliably compute a new tuple version based on that
row. heap_update() then decides it needs to carry that xmax forward, as
it's a valid lock:
/*
* If the tuple we're updating is locked, we need to preserve the
locking
* info in the old tuple's Xmax. Prepare a new Xmax value for this.
*/
compute_new_xmax_infomask(HeapTupleHeaderGetRawXmax(oldtup.t_data),
oldtup.t_data->t_infomask,
oldtup.t_data->t_infomask2,
xid, *lockmode, true,
_old_tuple,
_old_tuple,
_old_tuple);
/*
* And also prepare an Xmax value for the new copy of the tuple. If
there
* was no xmax previously, or there was one but all lockers are now
gone,
* then use InvalidXid; otherwise, get the xmax from the old tuple. (In
* rare cases that might also be InvalidXid and yet not have the
* HEAP_XMAX_INVALID bit set; that's fine.)
*/
if ((oldtup.t_data->t_infomask & HEAP_XMAX_INVALID) ||
HEAP_LOCKED_UPGRADED(oldtup.t_data->t_infomask) ||
(checked_lockers && !locker_remains))
xmax_new_tuple = InvalidTransactionId;
else
xmax_new_tuple = HeapTupleHeaderGetRawXmax(oldtup.t_data);
but we really don't need to do any of that in this case - the only
locker is the current backend, after all.
I think this isn't great, because it'll later will cause unnecessary
hint bit writes (although ones somewhat likely combined with setting
XMIN_COMMITTED), and even extra work for freezing.
Based on a quick look this wasn't the case before the finer grained
tuple locking - which makes sense, there was no cases where locks would
need to be carried forward.
It's worthwhile to note that this *nearly* already works, because of the
following codepath in compute_new_xmax_infomask():
* If the lock to be acquired is for the same TransactionId as
the
* existing lock, there's an optimization possible: consider
only the
* strongest of both locks as the only one present, and restart.
*/
if (xmax == add_to_xmax)
{
/*
* Note that it's not possible for the original tuple
to be updated:
* we wouldn't be here because the tuple would have
been invisible and
* we wouldn't try to update it. As a subtlety, this
code can also
* run when traversing an update chain to lock future
versions of a
* tuple. But we wouldn't be here either, because the
add_to_xmax
* would be different from the original updater.
*/
Assert(HEAP_XMAX_IS_LOCKED_ONLY(old_infomask));
/* acquire the strongest of both */
if (mode < old_mode)
mode = old_mode;
/* mustn't touch is_update */