Re: ON CONFLICT (and manual row locks) cause xmax of updated tuple to unnecessarily be set

2019-07-25 Thread Peter Geoghegan
On Thu, Jul 25, 2019 at 3:10 PM Andres Freund  wrote:
> > I agree that this is unfortunate. Are you planning on working on it?
>
> Not at the moment, no. Are you planning / hoping to take a stab at it?

The current behavior ought to be fixed, and it seems like it falls to
me to do that. OTOH, anything that's MultiXact adjacent makes my eyes
water. I don't consider myself to be particularly well qualified.

I'm sure that I could quickly find a way of making the behavior you
have pointed out match what is expected without causing any regression
tests to fail, but that's the easy part.

--
Peter Geoghegan




Re: ON CONFLICT (and manual row locks) cause xmax of updated tuple to unnecessarily be set

2019-07-25 Thread Andres Freund
Hi,

On 2019-07-24 17:14:39 -0700, Peter Geoghegan wrote:
> On Wed, Jul 24, 2019 at 4:24 PM Andres Freund  wrote:
> > but we really don't need to do any of that in this case - the only
> > locker is the current backend, after all.
> >
> > I think this isn't great, because it'll later will cause unnecessary
> > hint bit writes (although ones somewhat likely combined with setting
> > XMIN_COMMITTED), and even extra work for freezing.
> >
> > Based on a quick look this wasn't the case before the finer grained
> > tuple locking - which makes sense, there was no cases where locks would
> > need to be carried forward.
> 
> I agree that this is unfortunate. Are you planning on working on it?

Not at the moment, no. Are you planning / hoping to take a stab at it?

Greetings,

Andres Freund




Re: ON CONFLICT (and manual row locks) cause xmax of updated tuple to unnecessarily be set

2019-07-24 Thread Peter Geoghegan
On Wed, Jul 24, 2019 at 4:24 PM Andres Freund  wrote:
> as you can see the same xmax is set for both row version, with the new
> infomask being  HEAP_XMAX_KEYSHR_LOCK | HEAP_XMAX_LOCK_ONLY | HEAP_UPDATED.

Meta remark about your test case: I am a big fan of microbenchmarks
like this, which execute simple DML queries using a single connection,
and then consider if the on-disk state looks as good as expected, for
some value of "good". I had a lot of success with this approach while
developing the v12 work on nbtree, where I went to the trouble of
automating everything. The same test suite also helped with the nbtree
compression/deduplication patch just today.

I like to call these tests "wind tunnel tests". It's far from obvious
that you can take a totally synthetic, serial test, and use it to
measure something that is important to real workloads. It seems to
work well when there is a narrow, specific thing that you're
interested in. This is especially true when there is a real risk of
regressing performance in some way.

> but we really don't need to do any of that in this case - the only
> locker is the current backend, after all.
>
> I think this isn't great, because it'll later will cause unnecessary
> hint bit writes (although ones somewhat likely combined with setting
> XMIN_COMMITTED), and even extra work for freezing.
>
> Based on a quick look this wasn't the case before the finer grained
> tuple locking - which makes sense, there was no cases where locks would
> need to be carried forward.

I agree that this is unfortunate. Are you planning on working on it?

-- 
Peter Geoghegan




ON CONFLICT (and manual row locks) cause xmax of updated tuple to unnecessarily be set

2019-07-24 Thread Andres Freund
Hi,


Scenario is a very plain upsert:

CREATE TABLE upsert(key int primary key);
INSERT INTO upsert VALUES(1) ON CONFLICT (key) DO UPDATE SET key = excluded.key;
INSERT INTO upsert VALUES(1) ON CONFLICT (key) DO UPDATE SET key = excluded.key;
INSERT 0 1
INSERT 0 1

postgres[8755][1]=# SELECT page_items.* FROM generate_series(0, 
pg_relation_size('upsert'::regclass::text) / 8192 - 1) blkno, 
get_raw_page('upsert'::regclass::text, blkno::int4) AS raw_page, 
heap_page_items(raw_page) as page_items;
┌┬┬──┬┬──┬──┬──┬┬─┬┬┬┬┬┐
│ lp │ lp_off │ lp_flags │ lp_len │  t_xmin  │  t_xmax  │ t_field3 │ t_ctid │ 
t_infomask2 │ t_infomask │ t_hoff │ t_bits │ t_oid  │   t_data   │
├┼┼──┼┼──┼──┼──┼┼─┼┼┼┼┼┤
│  1 │   8160 │1 │ 28 │ 19742591 │ 19742592 │0 │ (0,2)  │   
24577 │256 │ 24 │ (null) │ (null) │ \x0100 │
│  2 │   8128 │1 │ 28 │ 19742592 │ 19742592 │0 │ (0,2)  │   
32769 │   8336 │ 24 │ (null) │ (null) │ \x0100 │
└┴┴──┴┴──┴──┴──┴┴─┴┴┴┴┴┘
(2 rows)

as you can see the same xmax is set for both row version, with the new
infomask being  HEAP_XMAX_KEYSHR_LOCK | HEAP_XMAX_LOCK_ONLY | HEAP_UPDATED.


The reason that happens is that ExecOnConflictUpdate() needs to lock the
row to be able to reliably compute a new tuple version based on that
row. heap_update() then decides it needs to carry that xmax forward, as
it's a valid lock:


/*
 * If the tuple we're updating is locked, we need to preserve the 
locking
 * info in the old tuple's Xmax.  Prepare a new Xmax value for this.
 */
compute_new_xmax_infomask(HeapTupleHeaderGetRawXmax(oldtup.t_data),
  
oldtup.t_data->t_infomask,
  
oldtup.t_data->t_infomask2,
  xid, *lockmode, true,
  _old_tuple, 
_old_tuple,
  _old_tuple);

/*
 * And also prepare an Xmax value for the new copy of the tuple.  If 
there
 * was no xmax previously, or there was one but all lockers are now 
gone,
 * then use InvalidXid; otherwise, get the xmax from the old tuple.  (In
 * rare cases that might also be InvalidXid and yet not have the
 * HEAP_XMAX_INVALID bit set; that's fine.)
 */
if ((oldtup.t_data->t_infomask & HEAP_XMAX_INVALID) ||
HEAP_LOCKED_UPGRADED(oldtup.t_data->t_infomask) ||
(checked_lockers && !locker_remains))
xmax_new_tuple = InvalidTransactionId;
else
xmax_new_tuple = HeapTupleHeaderGetRawXmax(oldtup.t_data);

but we really don't need to do any of that in this case - the only
locker is the current backend, after all.

I think this isn't great, because it'll later will cause unnecessary
hint bit writes (although ones somewhat likely combined with setting
XMIN_COMMITTED), and even extra work for freezing.

Based on a quick look this wasn't the case before the finer grained
tuple locking - which makes sense, there was no cases where locks would
need to be carried forward.


It's worthwhile to note that this *nearly* already works, because of the
following codepath in compute_new_xmax_infomask():

 * If the lock to be acquired is for the same TransactionId as 
the
 * existing lock, there's an optimization possible: consider 
only the
 * strongest of both locks as the only one present, and restart.
 */
if (xmax == add_to_xmax)
{
/*
 * Note that it's not possible for the original tuple 
to be updated:
 * we wouldn't be here because the tuple would have 
been invisible and
 * we wouldn't try to update it.  As a subtlety, this 
code can also
 * run when traversing an update chain to lock future 
versions of a
 * tuple.  But we wouldn't be here either, because the 
add_to_xmax
 * would be different from the original updater.
 */
Assert(HEAP_XMAX_IS_LOCKED_ONLY(old_infomask));

/* acquire the strongest of both */
if (mode < old_mode)
mode = old_mode;
/* mustn't touch is_update */