Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0

2015-03-03 Thread Peter Geoghegan
On Tue, Mar 3, 2015 at 12:05 AM, Heikki Linnakangas hlinn...@iki.fi wrote:
 My experimental branch works just fine (with a variant jjanes_upsert
 with subxact looping), until I need to restart an update after a
 failed heap_update() that still returned HeapTupleMayBeUpdated
 (having super deleted within an ExecUpdate() call). There is no good
 way to do that for ExecUpdate() that I can see, because an existing,
 visible row is affected (unlike with ExecInsert()). Even if it was
 possible, it would be hugely invasive to already very complicated code
 paths.

 Ah, so we can't easily use super-deletion to back out an UPDATE. I had not
 considered that.

Yeah. When I got into considering making EvalPlanQualFetch() look at
speculative tokens, it became abundantly clear that that code would
never be committed, even if I could make it work.

 I continue to believe that the best way forward is to incrementally
 commit the work by committing ON CONFLICT IGNORE first. That way,
 speculative tokens will remain strictly the concern of UPSERTers or
 sessions doing INSERT ... ON CONFLICT IGNORE.


 Ok, let's try that. Can you cut a patch that does just ON CONFLICT IGNORE,
 please?

Of course. I'll have that for your shortly.

Thanks
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0

2015-03-03 Thread Heikki Linnakangas

On 03/02/2015 11:21 PM, Peter Geoghegan wrote:

On Mon, Mar 2, 2015 at 12:15 PM, Heikki Linnakangas hlinn...@iki.fi wrote:

Hmm. I used a b-tree to estimate the effect that the locking would have in
the UPSERT case, for UPSERT into a table with a b-tree index. But you're
right that for the question of whether this is acceptable for the case of
regular insert into a table with exclusion constraints, other indexams are
more interesting. In a very quick test, the overhead with a single GiST
index seems to be about 5%. IMHO that's still a noticeable slowdown, so
let's try to find a way to eliminate it.


Honestly, I don't know why you're so optimistic that this can be
fixed, even with that heavyweight lock (for regular inserters +
exclusion constraints).


We already concluded that it can be fixed, with the heavyweight lock. 
See http://www.postgresql.org/message-id/54f4a0e0.4070...@iki.fi. Do you 
see some new problem with that that hasn't been discussed yet? To 
eliminate the heavy-weight lock, we'll need some new ideas, but it 
doesn't seem that hard.



My experimental branch, which I showed you privately shows big
problems when I simulate upserting with exclusion constraints (so we
insert first, handle exclusion violations using the traditional upsert
subxact pattern that does work with B-Trees). It's much harder to back
out of a heap_update() than it is to back out of a heap_insert(),
since we might well be updated a tuple visible to some other MVCC
snapshot. Whereas we can super delete a tuple knowing that only a
dirty snapshot might have seen it, which bounds the complexity (only
wait sites for B-Trees + exclusion constraints need to worry about
speculative insertion tokens and so on).

My experimental branch works just fine (with a variant jjanes_upsert
with subxact looping), until I need to restart an update after a
failed heap_update() that still returned HeapTupleMayBeUpdated
(having super deleted within an ExecUpdate() call). There is no good
way to do that for ExecUpdate() that I can see, because an existing,
visible row is affected (unlike with ExecInsert()). Even if it was
possible, it would be hugely invasive to already very complicated code
paths.


Ah, so we can't easily use super-deletion to back out an UPDATE. I had 
not considered that.



I continue to believe that the best way forward is to incrementally
commit the work by committing ON CONFLICT IGNORE first. That way,
speculative tokens will remain strictly the concern of UPSERTers or
sessions doing INSERT ... ON CONFLICT IGNORE.


Ok, let's try that. Can you cut a patch that does just ON CONFLICT 
IGNORE, please?


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0

2015-03-02 Thread Peter Geoghegan
On Mon, Mar 2, 2015 at 12:15 PM, Heikki Linnakangas hlinn...@iki.fi wrote:
 Hmm. I used a b-tree to estimate the effect that the locking would have in
 the UPSERT case, for UPSERT into a table with a b-tree index. But you're
 right that for the question of whether this is acceptable for the case of
 regular insert into a table with exclusion constraints, other indexams are
 more interesting. In a very quick test, the overhead with a single GiST
 index seems to be about 5%. IMHO that's still a noticeable slowdown, so
 let's try to find a way to eliminate it.

Honestly, I don't know why you're so optimistic that this can be
fixed, even with that heavyweight lock (for regular inserters +
exclusion constraints).

My experimental branch, which I showed you privately shows big
problems when I simulate upserting with exclusion constraints (so we
insert first, handle exclusion violations using the traditional upsert
subxact pattern that does work with B-Trees). It's much harder to back
out of a heap_update() than it is to back out of a heap_insert(),
since we might well be updated a tuple visible to some other MVCC
snapshot. Whereas we can super delete a tuple knowing that only a
dirty snapshot might have seen it, which bounds the complexity (only
wait sites for B-Trees + exclusion constraints need to worry about
speculative insertion tokens and so on).

My experimental branch works just fine (with a variant jjanes_upsert
with subxact looping), until I need to restart an update after a
failed heap_update() that still returned HeapTupleMayBeUpdated
(having super deleted within an ExecUpdate() call). There is no good
way to do that for ExecUpdate() that I can see, because an existing,
visible row is affected (unlike with ExecInsert()). Even if it was
possible, it would be hugely invasive to already very complicated code
paths.

I continue to believe that the best way forward is to incrementally
commit the work by committing ON CONFLICT IGNORE first. That way,
speculative tokens will remain strictly the concern of UPSERTers or
sessions doing INSERT ... ON CONFLICT IGNORE. Unless, you're happy to
have UPDATEs still deadlock, and only fix unprincipled deadlocks for
the INSERT case. I don't know why you want to make the patch
incremental by fixing unprincipled deadlocks in the first place,
since you've said that you don't really care about it as a real world
problem, and because it now appears to have significant additional
difficulties that were not anticipated.

Please, let's focus on getting ON CONFLICT IGNORE into a commitable
state - that's the best way of incrementally committing this work.
Fixing unprincipled deadlocks is not a useful way of incrementally
committing this work. I've spent several days producing a prototype
that shows the exact nature of the problem. If you think I'm mistaken,
and that fixing unprincipled deadlocks first is the right thing to do,
please explain why with reference to that prototype. AFAICT, doing
things your way is going to add significant additional complexity for
*no appreciable benefit*. You've already gotten exactly what you were
looking for in every other regard. In particular, ON CONFLICT UPDATE
could work with exclusion constraints without any of these problems,
because of the way we do the auxiliary update there (we lock the row
ahead of updating/qual evaluation). I've bent over backwards to make
sure that is the case. Please, throw me a bone here.

Thank you
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0

2015-03-02 Thread Heikki Linnakangas

On 02/21/2015 10:41 PM, Peter Geoghegan wrote:

On Sat, Feb 21, 2015 at 11:15 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

What I had in mind is that the winning inserter waits on the other
inserter's token, without super-deleting. Like all inserts do today. So the
above scenario becomes:

* Session 1 physically inserts, and then checks for a conflict.

* Session 2 physically inserts, and then checks for a conflict.

* Session 1 sees session 2's conflicting TID. Session 1's XID is older, so
it wins. It waits for session 2's token, without super-deleting.

* Session 2 sees session 1's conflicting TID. It super deletes,
releases token, and sleeps on session 1's token.

* Session 1 wakes up. It looks at session 2's tuple again and sees that it
was super-deleted. There are no further conflicts, so the insertion is
complete, and it releases the token.

* Session 2 wakes up. It looks at session 1's tuple again and sees that it's
still there. It goes back to sleep, this time on session 2's XID.

* Session 1 commits. Session 2 wakes up, sees that the tuple is still there,
and throws a contraint violation error.


I think we're actually 100% in agreement, then. I just prefer to have
the second last step (the check without a promise tuple visible to
anyone made by the loser) occur as part of the pre-check that
happens anyway with ON CONFLICT IGNORE. Otherwise, you'll end up with
some much more complicated control flow that has to care about not
doing that twice for ON CONFLICT IGNORE...and for the benefit of what?
For regular inserters, that we don't actually care about fixing
unprincipled deadlocks for?


Right. That will allow me to review and test the locking part of the 
patch separately.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0

2015-03-02 Thread Heikki Linnakangas

On 02/17/2015 02:11 AM, Peter Geoghegan wrote:



Whatever works, really. I can't say that the performance implications
of acquiring that hwlock are at the forefront of my mind. I never
found that to be a big problem on an 8 core box, relative to vanilla
INSERTs, FWIW - lock contention is not normal, and may be where any
heavweight lock costs would really be encountered.


Oh, cool. I guess the fast-path in lmgr.c kicks ass, then :-).

Seems that way. But even if that wasn't true, it wouldn't matter,
since I don't see that we have a choice.


I did some quick performance testing on this. For easy testing, I used a 
checkout of git master, and simply added LockAcquire + LockRelease calls 
to ExecInsert, around the heap_insert() call. The test case I used was:


psql -c create table footest (id serial primary key);

echo insert into footest select from generate_series(1, 1);  
inserts.sql


pgbench -n -f inserts.sql postgres -T100 -c4

With the extra lock calls, I got 56 tps on my laptop. With unpatched git 
master, I got 60 tps. I also looked at the profile with perf, and 
indeed about 10% of the CPU time was spent in LockAcquire and 
LockRelease together.


So the extra locking incurs about 10% overhead. I think this was pretty 
ḿuch a worst case scenario, but not a hugely unrealistic one - many 
real-world tables have only a few columns, and few indexes. With more 
CPUs you would probably start to see contention, in addition to just the 
extra overhead.


Are we OK with a 10% overhead, caused by the locking? That's probably 
acceptable if that's what it takes to get UPSERT. But it's not OK just 
to solve the deadlock issue with regular insertions into a table with 
exclusion constraints. Can we find a scheme to eliminate that overhead?


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0

2015-03-02 Thread Peter Geoghegan
On Mon, Mar 2, 2015 at 11:20 AM, Heikki Linnakangas hlinn...@iki.fi wrote:
 Are we OK with a 10% overhead, caused by the locking? That's probably
 acceptable if that's what it takes to get UPSERT. But it's not OK just to
 solve the deadlock issue with regular insertions into a table with exclusion
 constraints. Can we find a scheme to eliminate that overhead?

Looks like you tested a B-Tree index here. That doesn't seem
particularly representative of what you'd see with exclusion
constraints.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0

2015-03-02 Thread Heikki Linnakangas

On 03/02/2015 09:29 PM, Peter Geoghegan wrote:

On Mon, Mar 2, 2015 at 11:20 AM, Heikki Linnakangas hlinn...@iki.fi wrote:

Are we OK with a 10% overhead, caused by the locking? That's probably
acceptable if that's what it takes to get UPSERT. But it's not OK just to
solve the deadlock issue with regular insertions into a table with exclusion
constraints. Can we find a scheme to eliminate that overhead?


Looks like you tested a B-Tree index here. That doesn't seem
particularly representative of what you'd see with exclusion
constraints.


Hmm. I used a b-tree to estimate the effect that the locking would have 
in the UPSERT case, for UPSERT into a table with a b-tree index. But 
you're right that for the question of whether this is acceptable for the 
case of regular insert into a table with exclusion constraints, other 
indexams are more interesting. In a very quick test, the overhead with a 
single GiST index seems to be about 5%. IMHO that's still a noticeable 
slowdown, so let's try to find a way to eliminate it.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0

2015-02-21 Thread Heikki Linnakangas

On 02/21/2015 12:15 AM, Peter Geoghegan wrote:

On Fri, Feb 20, 2015 at 1:07 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

Then I refuse to believe that the livelock hazard exists, without the
pre-check. If you have a livelock scenario in mind, it really shouldn't be
that difficult to write down the list of steps.


I just meant practical, recreatable steps - a test case. I should
emphasize that what I'm saying is not that important. Even if I am
wrong, I'm not suggesting that we do anything that we don't both agree
is needed anyway. If I'm right, that is merely an impediment to
incrementally committing the work by fixing exclusion constraints,
AFAICT. Ultimately, that isn't all that important. Anyway, here is how
I think livelocks could happen, in theory, with regular insertion into
a table with exclusion constraints, with your patch [1] applied (which
has no pre-check), this can happen:

* Session 1 physically inserts, and then checks for a conflict.

* Session 2 physically inserts, and then checks for a conflict.

* Session 1 sees session 2's conflicting TID, then super deletes and
releases token.

* Session 2 sees session 1's conflicting TID, then super deletes and
releases token.

* Session 1 waits or tries to wait on session 2's token. It isn't held
anymore, or is only held for an instant.

* Session 2 waits or tries to wait on session 1's token. It isn't held
anymore, or is only held for an instant.

* Session 1 restarts from scratch, having made no useful progress in
respect of the slot being inserted.

* Session 2 restarts from scratch, having made no useful progress in
respect of the slot being inserted.

(Livelock)

If there is a tie-break on XID (you omitted this from your patch [1]
but acknowledge it as an omission), than that doesn't really change
anything (without adding a pre-check, too). That's because: What do we
actually do or not do when we're the oldest XID, that gets to win?


Ah, ok, I can see the confusion now.


Do we not wait on anything, and just declare that we're done? Then I
think that breaks exclusion constraint enforcement, because we need to
rescan the index to do that (i.e., goto retry). Do we wait on their
token, as my most recent revision does, but *without* a pre-check, for
regular inserters? Then I think that our old tuple could keep multiple
other sessions spinning indefinitely.


What I had in mind is that the winning inserter waits on the other 
inserter's token, without super-deleting. Like all inserts do today. So 
the above scenario becomes:


* Session 1 physically inserts, and then checks for a conflict.

* Session 2 physically inserts, and then checks for a conflict.

* Session 1 sees session 2's conflicting TID. Session 1's XID is older, 
so it wins. It waits for session 2's token, without super-deleting.


* Session 2 sees session 1's conflicting TID. It super deletes,
releases token, and sleeps on session 1's token.

* Session 1 wakes up. It looks at session 2's tuple again and sees that 
it was super-deleted. There are no further conflicts, so the insertion 
is complete, and it releases the token.


* Session 2 wakes up. It looks at session 1's tuple again and sees that 
it's still there. It goes back to sleep, this time on session 2's XID.


* Session 1 commits. Session 2 wakes up, sees that the tuple is still 
there, and throws a contraint violation error.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0

2015-02-21 Thread Peter Geoghegan
On Sat, Feb 21, 2015 at 11:15 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Ah, ok, I can see the confusion now.

Cool.

 Do we not wait on anything, and just declare that we're done? Then I
 think that breaks exclusion constraint enforcement, because we need to
 rescan the index to do that (i.e., goto retry). Do we wait on their
 token, as my most recent revision does, but *without* a pre-check, for
 regular inserters? Then I think that our old tuple could keep multiple
 other sessions spinning indefinitely.

 What I had in mind is that the winning inserter waits on the other
 inserter's token, without super-deleting. Like all inserts do today. So the
 above scenario becomes:

 * Session 1 physically inserts, and then checks for a conflict.

 * Session 2 physically inserts, and then checks for a conflict.

 * Session 1 sees session 2's conflicting TID. Session 1's XID is older, so
 it wins. It waits for session 2's token, without super-deleting.

 * Session 2 sees session 1's conflicting TID. It super deletes,
 releases token, and sleeps on session 1's token.

 * Session 1 wakes up. It looks at session 2's tuple again and sees that it
 was super-deleted. There are no further conflicts, so the insertion is
 complete, and it releases the token.

 * Session 2 wakes up. It looks at session 1's tuple again and sees that it's
 still there. It goes back to sleep, this time on session 2's XID.

 * Session 1 commits. Session 2 wakes up, sees that the tuple is still there,
 and throws a contraint violation error.

I think we're actually 100% in agreement, then. I just prefer to have
the second last step (the check without a promise tuple visible to
anyone made by the loser) occur as part of the pre-check that
happens anyway with ON CONFLICT IGNORE. Otherwise, you'll end up with
some much more complicated control flow that has to care about not
doing that twice for ON CONFLICT IGNORE...and for the benefit of what?
For regular inserters, that we don't actually care about fixing
unprincipled deadlocks for?

Are we on the same page now?
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0

2015-02-20 Thread Peter Geoghegan
On Fri, Feb 20, 2015 at 1:07 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Then I refuse to believe that the livelock hazard exists, without the
 pre-check. If you have a livelock scenario in mind, it really shouldn't be
 that difficult to write down the list of steps.

I just meant practical, recreatable steps - a test case. I should
emphasize that what I'm saying is not that important. Even if I am
wrong, I'm not suggesting that we do anything that we don't both agree
is needed anyway. If I'm right, that is merely an impediment to
incrementally committing the work by fixing exclusion constraints,
AFAICT. Ultimately, that isn't all that important. Anyway, here is how
I think livelocks could happen, in theory, with regular insertion into
a table with exclusion constraints, with your patch [1] applied (which
has no pre-check), this can happen:

* Session 1 physically inserts, and then checks for a conflict.

* Session 2 physically inserts, and then checks for a conflict.

* Session 1 sees session 2's conflicting TID, then super deletes and
releases token.

* Session 2 sees session 1's conflicting TID, then super deletes and
releases token.

* Session 1 waits or tries to wait on session 2's token. It isn't held
anymore, or is only held for an instant.

* Session 2 waits or tries to wait on session 1's token. It isn't held
anymore, or is only held for an instant.

* Session 1 restarts from scratch, having made no useful progress in
respect of the slot being inserted.

* Session 2 restarts from scratch, having made no useful progress in
respect of the slot being inserted.

(Livelock)

If there is a tie-break on XID (you omitted this from your patch [1]
but acknowledge it as an omission), than that doesn't really change
anything (without adding a pre-check, too). That's because: What do we
actually do or not do when we're the oldest XID, that gets to win?
Do we not wait on anything, and just declare that we're done? Then I
think that breaks exclusion constraint enforcement, because we need to
rescan the index to do that (i.e., goto retry). Do we wait on their
token, as my most recent revision does, but *without* a pre-check, for
regular inserters? Then I think that our old tuple could keep multiple
other sessions spinning indefinitely. Only by checking for conflicts
*first*, without a non-super-deleted physical index tuple can these
other sessions notice that there is a conflict *without creating more
conflicts*, which is what I believe is really needed. At the very
least it's something I'm much more comfortable with, and that seems
like reason enough to do it that way, given that we don't actually
care about unprincipled deadlocks with regular inserters with
exclusion constraints. Why take the chance with livelocking such
inserters, though?

I hope that we don't get bogged down on this, because, as I said, it
doesn't matter too much. I'm tempted to concede the point for that
reason, since the livelock is probably virtually impossible. I'm just
giving you my opinion on how to make the handling of exclusion
constraints as reliable as possible.

Thanks

[1] http://www.postgresql.org/message-id/54dfc6f8.5050...@vmware.com
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0

2015-02-20 Thread Heikki Linnakangas

On 02/20/2015 10:39 PM, Peter Geoghegan wrote:

On Fri, Feb 20, 2015 at 11:34 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

So, um, are you agreeing that there is no problem? Or did I misunderstand?
If you see a potential issue here, can you explain it as a simple list of
steps, please.


Yes. I'm saying that AFAICT, there is no livelock hazard provided
other sessions must do the pre-check (as they must for ON CONFLICT
IGNORE). So I continue to believe that they must pre-check, which you
questioned.
...
Hard to break down the problem into steps, since it isn't a problem
that I was able to recreate (as a noticeable livelock).


Then I refuse to believe that the livelock hazard exists, without the 
pre-check. If you have a livelock scenario in mind, it really shouldn't 
be that difficult to write down the list of steps.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0

2015-02-20 Thread Peter Geoghegan
On Fri, Feb 20, 2015 at 11:34 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 So, um, are you agreeing that there is no problem? Or did I misunderstand?
 If you see a potential issue here, can you explain it as a simple list of
 steps, please.

Yes. I'm saying that AFAICT, there is no livelock hazard provided
other sessions must do the pre-check (as they must for ON CONFLICT
IGNORE). So I continue to believe that they must pre-check, which you
questioned.

The only real downside here (with not doing the pre-check for regular
inserters with exclusion constraints) is that we can't fix
unprincipled deadlocks for general exclusion constraint inserters
(since we don't want to make them pre-check), but we don't actually
care about that directly. But it also means that it's hard to see how
we can incrementally commit such a fix to break down the patch into
more manageable chunks, which is a little unfortunate.

Hard to break down the problem into steps, since it isn't a problem
that I was able to recreate (as a noticeable livelock). But the point
of what I was saying is that the first phase pre-check allows us to
notice that the one session that got stuck waiting in the second phase
(other conflicters notice its tuple, and so don't insert a new tuple
this iteration).

Conventional insertion with exclusion constraints insert first and
only then looks for conflicts (since there is no pre-check). More
concretely, if you're the transaction that does not break here,
within check_exclusion_or_unique_constraint(), and so unexpectedly
waits in the second phase:

+ /*
+  * At this point we have either a conflict or a potential conflict. If
+  * we're not supposed to raise error, just return the fact of the
+  * potential conflict without waiting to see if it's real.
+  */
+ if (violationOK  !wait)
+ {
+ /*
+  * For unique indexes, detecting conflict is coupled with physical
+  * index tuple insertion, so we won't be called for recheck
+  */
+ Assert(!indexInfo-ii_Unique);
+
+ conflict = true;
+ if (conflictTid)
+ *conflictTid = tup-t_self;
+
+ /*
+  * Livelock insurance.
+  *
+  * When doing a speculative insertion pre-check, we cannot have an
+  * unprincipled deadlock with another session, fundamentally
+  * because there is no possible mutual dependency, since we only
+  * hold a lock on our token, without attempting to lock anything
+  * else (maybe this is not the first iteration, but no matter;
+  * we'll have super deleted and released insertion token lock if
+  * so, and all locks needed are already held.  Also, our XID lock
+  * is irrelevant.)
+  *
+  * In the second phase, where there is a re-check for conflicts, we
+  * can't deadlock either (we never lock another thing, since we
+  * don't wait in that phase).  However, a theoretical livelock
+  * hazard exists:  Two sessions could each see each other's
+  * conflicting tuple, and each could go and delete, retrying
+  * forever.
+  *
+  * To break the mutual dependency, we may wait on the other xact
+  * here over our caller's request to not do so (in the second
+  * phase).  This does not imply the risk of unprincipled deadlocks
+  * either, because if we end up unexpectedly waiting, the other
+  * session will super delete its own tuple *before* releasing its
+  * token lock and freeing us, and without attempting to wait on us
+  * to release our token lock.  We'll take another iteration here,
+  * after waiting on the other session's token, not find a conflict
+  * this time, and then proceed (assuming we're the oldest XID).
+  *
+  * N.B.:  Unprincipled deadlocks are still theoretically possible
+  * with non-speculative insertion with exclusion constraints, but
+  * this seems inconsequential, since an error was inevitable for
+  * one of the sessions anyway.  We only worry about speculative
+  * insertion's problems, since they're likely with idiomatic usage.
+  */
+ if (TransactionIdPrecedes(xwait, GetCurrentTransactionId()))
+ break;  /* go and super delete/restart speculative insertion */
+ }
+

Then you must successfully insert when you finally goto retry and do
another iteration within check_exclusion_or_unique_constraint(). The
other conflicters can't have failed to notice your pre-existing tuple,
and can't have failed to super delete their own tuples before you are
woken (provided you really are the single oldest XID).

Is that any clearer?
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0

2015-02-20 Thread Heikki Linnakangas

On 02/19/2015 10:09 PM, Peter Geoghegan wrote:

On Thu, Feb 19, 2015 at 11:10 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

I fully agree with your summary here. However, why should we suppose
that while we wait, the other backends don't both delete and then
re-insert their tuple? They need the pre-check to know not to
re-insert their tuple (seeing our tuple, immediately after we wake as
the preferred backend with the older XID) in order to break the race.
But today, exclusion constraints are optimistic in that the insert
happens first, and only then the check. The pre-check turns that the
other way around, in a limited though necessary sense.


I'm not sure I understand exactly what you're saying, but AFAICS the
pre-check doesn't completely solve that either. It's entirely possible that
the other backend deletes its tuple, our backend then performs the
pre-check, and the other backend re-inserts its tuple again. Sure, the
pre-check reduces the chances, but we're talking about a rare condition to
begin with, so I don't think it makes sense to add much code just to reduce
the chances further.


But super deletion occurs *before* releasing the token lock, which is
the last thing we do before looping around and starting again. So iff
we're the oldest XID, the one that gets to win by unexpectedly
waiting on another's token in our second phase (second call to
check_exclusion_or_unique_constraint()), we will not, in fact, see
anyone else's tuple, because they'll all be forced to go through the
first phase and find our pre-existing, never-deleted tuple, so we
can't see any new tuple from them. And, because they super delete
before releasing their token, they'll definitely have super deleted
when we're woken up, so we can't see any old/existing tuple either. We
have our tuple inserted this whole time - ergo, we do, in fact, win
reliably.


So, um, are you agreeing that there is no problem? Or did I 
misunderstand? If you see a potential issue here, can you explain it as 
a simple list of steps, please.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0

2015-02-19 Thread Peter Geoghegan
On Thu, Feb 19, 2015 at 11:10 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 I fully agree with your summary here. However, why should we suppose
 that while we wait, the other backends don't both delete and then
 re-insert their tuple? They need the pre-check to know not to
 re-insert their tuple (seeing our tuple, immediately after we wake as
 the preferred backend with the older XID) in order to break the race.
 But today, exclusion constraints are optimistic in that the insert
 happens first, and only then the check. The pre-check turns that the
 other way around, in a limited though necessary sense.

 I'm not sure I understand exactly what you're saying, but AFAICS the
 pre-check doesn't completely solve that either. It's entirely possible that
 the other backend deletes its tuple, our backend then performs the
 pre-check, and the other backend re-inserts its tuple again. Sure, the
 pre-check reduces the chances, but we're talking about a rare condition to
 begin with, so I don't think it makes sense to add much code just to reduce
 the chances further.

But super deletion occurs *before* releasing the token lock, which is
the last thing we do before looping around and starting again. So iff
we're the oldest XID, the one that gets to win by unexpectedly
waiting on another's token in our second phase (second call to
check_exclusion_or_unique_constraint()), we will not, in fact, see
anyone else's tuple, because they'll all be forced to go through the
first phase and find our pre-existing, never-deleted tuple, so we
can't see any new tuple from them. And, because they super delete
before releasing their token, they'll definitely have super deleted
when we're woken up, so we can't see any old/existing tuple either. We
have our tuple inserted this whole time - ergo, we do, in fact, win
reliably.

The fly in the ointment is regular inserters, perhaps, but we've
agreed that they're not too important here, and even when that happens
we're in deadlock land, not livelock land, which is obviously a
nicer place to live.

Does that make sense?
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0

2015-02-19 Thread Heikki Linnakangas

On 02/18/2015 11:43 PM, Peter Geoghegan wrote:

Heikki seemed to think that the deadlock problems were not really
worth fixing independently of ON CONFLICT UPDATE support, but rather
represented a useful way of committing code incrementally. Do I have
that right?


Yes.


The way I chose to break the livelock (what I call livelock
insurance) does indeed involve comparing XIDs, which Heikki thought
most promising. But it also involves having the oldest XID wait on
another session's speculative token in the second phase, which
ordinarily does not occur in the second
phase/check_exclusion_or_unique_constraint() call. The idea is that
one session is guaranteed to be the waiter that has a second iteration
within its second-phase check_exclusion_or_unique_constraint() call,
where (following the super deletion of conflict TIDs by the other
conflicting session or sessions) reliably finds that it can proceed
(those other sessions are denied the opportunity to reach their second
phase by our second phase waiter's still-not-super-deleted tuple).

However, it's difficult to see how to map this on to general exclusion
constraint insertion + enforcement. In Heikki's recent sketch of this
[1], there is no pre-check, since that is considered an UPSERT thing
deferred until a later patch, and therefore my scheme here cannot work
(recall that for plain inserts with exclusion constraints, we insert
first and check last). I have a hard time justifying adding the
pre-check for plain exclusion constraint inserters given the total
lack of complaints from the field regarding unprincipled deadlocks,
and given the fact that it would probably make the code more
complicated than it needs to be. It is critical that there be a
pre-check to prevent livelock, though, because otherwise the
restarting sessions can go straight to their second phase
(check_exclusion_or_unique_constraint() call), without ever realizing
that they should not do so.


Hmm. I haven't looked at your latest patch, but I don't think you need 
to pre-check for this to work. To recap, the situation is that two 
backends have already inserted the heap tuple, and then see that the 
other backend's tuple conflicts. To avoid a livelock, it's enough that 
one backend super-deletes its own tuple first, before waiting for the 
other to complete, while the other other backend waits without 
super-deleting. No?



It seems like the livelock insurance is pretty close to or actually
free, and doesn't imply that a second phase wait for token needs to
happen too often. With heavy contention on a small number of possible
tuples (100), and 8 clients deleting and inserting, I can see it
happening only once every couple of hundred milliseconds on my laptop.
It's not hard to imagine why the code didn't obviously appear to be
broken before now, as the window for an actual livelock must have been
small. Also, deadlocks bring about more deadlocks (since the deadlock
detector kicks in), whereas livelocks do not bring about more
livelocks.


It might be easier to provoke the livelocks with a GiST opclass that's 
unusually slow. I wrote the attached opclass for the purpose of testing 
this a while ago, but I haven't actually gotten around to do much with 
it. It's called useless_gist, because it's a GiST opclass for 
integers, like btree_gist, but the penalty and picksplit functions are 
totally dumb. The result is that the tuples are put to the index in 
pretty much random order, and every scan has to scan the whole index. 
I'm posting it here, in the hope that it happens to be useful, but I 
don't really know if it is.


- Heikki



useless_gist.tar.gz
Description: application/gzip

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0

2015-02-19 Thread Peter Geoghegan
On Thu, Feb 19, 2015 at 5:21 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Hmm. I haven't looked at your latest patch, but I don't think you need to
 pre-check for this to work. To recap, the situation is that two backends
 have already inserted the heap tuple, and then see that the other backend's
 tuple conflicts. To avoid a livelock, it's enough that one backend
 super-deletes its own tuple first, before waiting for the other to complete,
 while the other other backend waits without super-deleting. No?

I fully agree with your summary here. However, why should we suppose
that while we wait, the other backends don't both delete and then
re-insert their tuple? They need the pre-check to know not to
re-insert their tuple (seeing our tuple, immediately after we wake as
the preferred backend with the older XID) in order to break the race.
But today, exclusion constraints are optimistic in that the insert
happens first, and only then the check. The pre-check turns that the
other way around, in a limited though necessary sense.

Granted, it's unlikely that we'd livelock due to one session always
deleting and then nullifying that by re-inserting in time, but the
theoretical risk seems real. Therefore, I'm not inclined to bother
committing something without a pre-check, particularly since we're not
really interested in fixing unprincipled deadlocks for ordinary
exclusion constraint inserters (useful to know that you also think
that doesn't matter, BTW). Does that make sense?

This is explained within livelock insurance new-to-V2.3 comments in
check_exclusion_or_unique_constraint().  (Not that I think that
explanation is easier to follow than this one).

 It might be easier to provoke the livelocks with a GiST opclass that's
 unusually slow. I wrote the attached opclass for the purpose of testing this
 a while ago, but I haven't actually gotten around to do much with it. It's
 called useless_gist, because it's a GiST opclass for integers, like
 btree_gist, but the penalty and picksplit functions are totally dumb. The
 result is that the tuples are put to the index in pretty much random order,
 and every scan has to scan the whole index. I'm posting it here, in the hope
 that it happens to be useful, but I don't really know if it is.

Thanks. I'll try and use this for testing. Haven't been able to break
exclusion constraints with the jjanes_upsert test suite in a long
time, now.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0

2015-02-19 Thread Heikki Linnakangas

On 02/19/2015 08:16 PM, Peter Geoghegan wrote:

On Thu, Feb 19, 2015 at 5:21 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

Hmm. I haven't looked at your latest patch, but I don't think you need to
pre-check for this to work. To recap, the situation is that two backends
have already inserted the heap tuple, and then see that the other backend's
tuple conflicts. To avoid a livelock, it's enough that one backend
super-deletes its own tuple first, before waiting for the other to complete,
while the other other backend waits without super-deleting. No?


I fully agree with your summary here. However, why should we suppose
that while we wait, the other backends don't both delete and then
re-insert their tuple? They need the pre-check to know not to
re-insert their tuple (seeing our tuple, immediately after we wake as
the preferred backend with the older XID) in order to break the race.
But today, exclusion constraints are optimistic in that the insert
happens first, and only then the check. The pre-check turns that the
other way around, in a limited though necessary sense.


I'm not sure I understand exactly what you're saying, but AFAICS the 
pre-check doesn't completely solve that either. It's entirely possible 
that the other backend deletes its tuple, our backend then performs the 
pre-check, and the other backend re-inserts its tuple again. Sure, the 
pre-check reduces the chances, but we're talking about a rare condition 
to begin with, so I don't think it makes sense to add much code just to 
reduce the chances further.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0

2015-02-19 Thread Heikki Linnakangas

On 02/16/2015 11:31 AM, Andres Freund wrote:

On 2015-02-16 10:00:24 +0200, Heikki Linnakangas wrote:

I'm starting to think that we should bite the bullet and consume an infomask
bit for this. The infomask bits are a scarce resource, but we should use
them when it makes sense. It would be good for forensic purposes too, to
leave a trace that a super-deletion happened.


Well, we IIRC don't have any left right now. We could reuse
MOVED_IN|MOVED_OUT, as that never was set in the past. But it'd
essentially use two infomask bits forever...


t_infomask is all used, but t_infomask2 has two bits left:


/*
 * information stored in t_infomask2:
 */
#define HEAP_NATTS_MASK 0x07FF  /* 11 bits for number of 
attributes */
/* bits 0x1800 are available */
#define HEAP_KEYS_UPDATED   0x2000  /* tuple was updated and key 
cols

 * modified, or tuple deleted */
#define HEAP_HOT_UPDATED0x4000  /* tuple was HOT-updated */
#define HEAP_ONLY_TUPLE 0x8000  /* this is heap-only tuple */

#define HEAP2_XACT_MASK 0xE000  /* visibility-related bits */


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0

2015-02-16 Thread Peter Geoghegan
On Mon, Feb 16, 2015 at 12:00 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 So INSERT ON CONFLICT IGNORE on a table with an exclusion constraint might
 fail. I don't like that. The point of having the command in the first place
 is to deal with concurrency issues. If it sometimes doesn't work, it's
 broken.

I don't like it either, although I think it wouldn't come up very
often with exclusion constraints - basically, it would rarely be
noticed due to the different use cases. To be honest, in suggesting
this idea I was hedging against us not figuring out a solution to that
problem in time. You didn't like my suggestion of dropping IGNORE
entirely, either. I'll do my best to come up with something, but I'm
uncomfortable that at this late stage, ON CONFLICT IGNORE support for
exclusion constraints seems like a risk to the entire project.

I ask that if push comes to shove you show some flexibility here. I'll
try my best to ensure that you don't have to even consider committing
something with a notable omission. You don't have to give me an answer
to this now.

 The idea of comparing the TIDs of the tuples as a tie-breaker seems most
 promising to me. If the conflicting tuple's TID is smaller than the inserted
 tuple's, super-delete and wait. Otherwise, wait without super-deletion.
 That's really simple. Do you see a problem with that?

No. I'll work on that, and see how it stands up to stress testing.
Come to think of it, that does seem most promising.

 BTW, it would good to explain somewhere in comments or a README the term
 unprincipled deadlock. It's been a useful concept, and hard to grasp. If
 you defined it once, with examples and everything, then you could just have
 See .../README in other places that need to refer it.

Agreed. I have described those in the revised executor README, in case
you missed that. Do you think they ought to have their own section?
Note that hash indexes have unprincipled deadlock issues, but no one
has bothered to fix them.

 Whatever works, really. I can't say that the performance implications
 of acquiring that hwlock are at the forefront of my mind. I never
 found that to be a big problem on an 8 core box, relative to vanilla
 INSERTs, FWIW - lock contention is not normal, and may be where any
 heavweight lock costs would really be encountered.

 Oh, cool. I guess the fast-path in lmgr.c kicks ass, then :-).

Seems that way. But even if that wasn't true, it wouldn't matter,
since I don't see that we have a choice.

 I don't see how storing the promise token in heap tuples buys us not
 having to involve heavyweight locking of tokens. (I think you may have
 made a thinko in suggesting otherwise)

 It wouldn't get rid of heavyweight locks, but it would allow getting rid of
 the procarray changes. The inserter could generate a token, then acquire the
 hw-lock on the token, and lastly insert the heap tuple with the correct
 token.

Do you really think that's worth the disk overhead? I generally agree
with the zero overhead principle, which is that anyone not using the
feature shouldn't pay no price for it (or vanishingly close to no
price). Can't say that I have a good sense of the added distributed
cost (if any) to be paid by adding new fields to the PGPROC struct
(since the PGXACT struct was added in 9.2). Is your only concern that
the PGPROC array will now have more fields, making it more
complicated? Surely that's a price worth paying to avoid making these
heap tuples artificially somewhat larger. You're probably left with
tuples that are at least 8 bytes larger, once alignment is taken into
consideration. That doesn't seem great.

 That couldn't work without further HeapTupleSatisfiesDirty() logic.

 Why not?

Just meant that it wasn't sufficient to check xmin == xmax, while
allowing that something like that could work with extra work (e.g. the
use of infomask bits)...

 Ok, so we can't unconditionally always ignore tuples with xmin==xmax. We
 would need an infomask bit to indicate super-deletion, and only ignore it if
 the bit is set.

...which is what you say here.

 I'm starting to think that we should bite the bullet and consume an infomask
 bit for this. The infomask bits are a scarce resource, but we should use
 them when it makes sense. It would be good for forensic purposes too, to
 leave a trace that a super-deletion happened.

 I too think the tqual.c changes are ugly. I can't see a way around
 using a token lock, though - I would only consider (and only consider)
 refining the interface by which a waiter becomes aware that it must
 wait on the outcome of the inserting xact's speculative
 insertion/value lock (and in particular, that is should not wait on
 xact outcome). We clearly need something to wait on that isn't an
 XID...heavyweight locking of a token value is the obvious way of doing
 that.


 Yeah.

 Jim Nasby said something about setting the HEAP_XMIN_INVALID hint bit.
 Maybe he is right...if that can be made to be reliable (always
 

Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0

2015-02-16 Thread Peter Geoghegan
On Mon, Feb 16, 2015 at 4:11 PM, Peter Geoghegan p...@heroku.com wrote:
 Jim Nasby said something about setting the HEAP_XMIN_INVALID hint bit.
 Maybe he is right...if that can be made to be reliable (always
 WAL-logged), it could be marginally better than setting xmin to
 invalidTransactionId.


 I'm not a big fan of that. The xmin-invalid bit is currently always just a
 hint.

 Well, Andres makes the point that that isn't quite so.

Hmm. So the tqual.c routines actually check if
(HeapTupleHeaderXminInvalid(tuple)). Which is:

#define HeapTupleHeaderXminInvalid(tup) \
( \
((tup)-t_infomask  (HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID)) == \
HEAP_XMIN_INVALID \
)

What appears to happen if I try to only use the HEAP_XMIN_INVALID bit
(and not explicitly set xmin to InvalidTransactionId, and not
explicitly check that xmin isn't InvalidTransactionId in each
HeapTupleSatisfies* routine) is that after a little while, Jeff Janes'
tool shows up spurious unique violations, as if some tuple has become
visible when it shouldn't have. I guess this is because the
HEAP_XMIN_COMMITTED hint bit can still be set, which in effect
invalidates the HEAP_XMIN_INVALID hint bit.

It takes about 2 minutes before this happens for the first time when
fsync = off, following a fresh initdb, which is unacceptable. I'm not
sure if it's worth trying to figure out how HEAP_XMIN_COMMITTED gets
set. Not that I'm 100% sure that that's what this really is; it just
seems very likely.

Attached broken patch (broken_visible.patch) shows the changes made to
reveal this problem. Only the changes to tqual.c and heap_delete()
should matter here, since I did not test recovery.

I also thought about unifying the check for if
(TransactionIdEquals(HeapTupleHeaderGetRawXmin(tuple),
InvalidTransactionId)) with the conventional
HeapTupleHeaderXminInvalid() macro, and leaving everything else as-is.
This is no good either, and for similar reasons - control often won't
reach the macro, which is behind a check of if
(!HeapTupleHeaderXminCommitted(tuple)).

The best I think we can hope for is having a dedicated if
(TransactionIdEquals(HeapTupleHeaderGetRawXmin(tuple),
InvalidTransactionId)) macro HeapTupleHeaderSuperDeleted() to do the
work each time, which does not need to be checked so often. A second
attached patch (compact_tqual_works.patch - which is non-broken,
AFAICT) shows how this is possible, while also moving the check
further down within each tqual.c routine (which seems more in keeping
with the fact that super deletion is a relatively obscure concept). I
haven't been able to break this variant using my existing test suite,
so this seems promising as a way of reducing tqual.c clutter. However,
as I said the other day, this is basically just polish.

-- 
Peter Geoghegan
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 0aa3e57..b777c56 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2899,7 +2899,7 @@ l1:
 	}
 	else
 	{
-		HeapTupleHeaderSetXmin(tp.t_data, InvalidTransactionId);
+		HeapTupleHeaderSetXminInvalid(tp.t_data);
 	}
 
 	/* Make sure there is no forward chain link in t_ctid */
@@ -7382,12 +7382,12 @@ heap_xlog_delete(XLogReaderState *record)
 		htup-t_infomask = ~(HEAP_XMAX_BITS | HEAP_MOVED);
 		htup-t_infomask2 = ~HEAP_KEYS_UPDATED;
 		HeapTupleHeaderClearHotUpdated(htup);
+		if (xlrec-flags  XLOG_HEAP_KILLED_SPECULATIVE_TUPLE)
+			HeapTupleHeaderSetXminInvalid(htup);
+
 		fix_infomask_from_infobits(xlrec-infobits_set,
    htup-t_infomask, htup-t_infomask2);
-		if (!(xlrec-flags  XLOG_HEAP_KILLED_SPECULATIVE_TUPLE))
-			HeapTupleHeaderSetXmax(htup, xlrec-xmax);
-		else
-			HeapTupleHeaderSetXmin(htup, InvalidTransactionId);
+		HeapTupleHeaderSetXmax(htup, xlrec-xmax);
 		HeapTupleHeaderSetCmax(htup, FirstCommandId, false);
 
 		/* Mark the page as a candidate for pruning */
diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c
index 99bb417..fd857b1 100644
--- a/src/backend/utils/time/tqual.c
+++ b/src/backend/utils/time/tqual.c
@@ -170,13 +170,6 @@ HeapTupleSatisfiesSelf(HeapTuple htup, Snapshot snapshot, Buffer buffer)
 	Assert(ItemPointerIsValid(htup-t_self));
 	Assert(htup-t_tableOid != InvalidOid);
 
-	/*
-	 * Never return super-deleted tuples
-	 */
-	if (TransactionIdEquals(HeapTupleHeaderGetRawXmin(tuple),
-			InvalidTransactionId))
-		return false;
-
 	if (!HeapTupleHeaderXminCommitted(tuple))
 	{
 		if (HeapTupleHeaderXminInvalid(tuple))
@@ -735,15 +728,6 @@ HeapTupleSatisfiesDirty(HeapTuple htup, Snapshot snapshot,
 	snapshot-xmin = snapshot-xmax = InvalidTransactionId;
 	snapshot-speculativeToken = 0;
 
-	/*
-	 * Never return super-deleted tuples
-	 *
-	 * XXX:  Comment this code out and you'll get conflicts within
-	 * ExecLockUpdateTuple(), which result in an infinite loop.
-	 */
-	if (TransactionIdEquals(HeapTupleHeaderGetRawXmin(tuple),
-			InvalidTransactionId))
-		return false;
 
 	if 

Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0

2015-02-16 Thread Andres Freund
On 2015-02-16 10:00:24 +0200, Heikki Linnakangas wrote:
 On 02/16/2015 02:44 AM, Peter Geoghegan wrote:
 Are we happy with acquiring the SpeculativeInsertLock heavy-weight lock for
 every insertion? That seems bad for performance reasons. Also, are we happy
 with adding the new fields to the proc array? Another alternative that was
 discussed was storing the speculative insertion token on the heap tuple
 itself. (See
 http://www.postgresql.org/message-id/52d00d2d.6030...@vmware.com)
 
 Whatever works, really. I can't say that the performance implications
 of acquiring that hwlock are at the forefront of my mind. I never
 found that to be a big problem on an 8 core box, relative to vanilla
 INSERTs, FWIW - lock contention is not normal, and may be where any
 heavweight lock costs would really be encountered.
 
 Oh, cool. I guess the fast-path in lmgr.c kicks ass, then :-).

I don't think it actually uses the fast path, does it? IIRC that's
restricted to LOCKTAG_RELATION when done via LockAcquireExtended + open
coded for the VirtualXactLock table.

I'm not super worried atm either. Worth checking, but probably not worth
obsessing over.

 Besides, why should one transaction be entitled to insert a
 conflicting value tuple just because a still running transaction
 deleted it (having also inserted the tuple itself)? Didn't one
 prototype version of value locking #2 have that as a bug [1]? In fact,
 originally, didn't the xmin set to invalid thing come immediately
 from realizing that that wasn't workable?
 
 Ah, right. So the problem was that some code might assume that if you insert
 a row, delete it in the same transaction, and then insert the same value
 again, the 2nd insertion will always succeed because the previous insertion
 effectively acquired a value-lock on the key.
 
 Ok, so we can't unconditionally always ignore tuples with xmin==xmax. We
 would need an infomask bit to indicate super-deletion, and only ignore it if
 the bit is set.
 
 I'm starting to think that we should bite the bullet and consume an infomask
 bit for this. The infomask bits are a scarce resource, but we should use
 them when it makes sense. It would be good for forensic purposes too, to
 leave a trace that a super-deletion happened.

Well, we IIRC don't have any left right now. We could reuse
MOVED_IN|MOVED_OUT, as that never was set in the past. But it'd
essentially use two infomask bits forever...

 Jim Nasby said something about setting the HEAP_XMIN_INVALID hint bit.
 Maybe he is right...if that can be made to be reliable (always
 WAL-logged), it could be marginally better than setting xmin to
 invalidTransactionId.
 
 I'm not a big fan of that. The xmin-invalid bit is currently always just a
 hint.

We already rely on XMIN_INVALID being set correctly for
freezing. C.f. HeapTupleHeaderXminFrozen, HeapTupleHeaderXminInvalid, et
al. So it'd not necessarily end up being that bad. And the super
deletion could easily just set it in the course of it's WAL logging.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0

2015-02-16 Thread Heikki Linnakangas

On 02/16/2015 02:44 AM, Peter Geoghegan wrote:

On Sat, Feb 14, 2015 at 2:06 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

I did not solve the potential for livelocks (discussed at
http://www.postgresql.org/message-id/cam3swztftt_fehet3tu3ykcpcypynnauquz3q+naasnh-60...@mail.gmail.com).
The patch always super-deletes the already-inserted tuple on conflict, and
then waits for the other inserter. That would be nice to fix, using one of
the ideas from that thread, or something else.


How about we don't super-delete at all with exclusion constraints? I'd
be willing to accept unprincipled deadlocks for exclusion constraints,
because they already exist today for UPSERT/NOSERT type use cases, and
with idiomatic usage seem far less likely for the IGNORE variant
(especially with exclusion constraints).


So INSERT ON CONFLICT IGNORE on a table with an exclusion constraint 
might fail. I don't like that. The point of having the command in the 
first place is to deal with concurrency issues. If it sometimes doesn't 
work, it's broken.



I can see people using ON
CONFLICT UPDATE where they'd almost or actually be better off using a
plain UPDATE - that's quite a different situation. I find livelocks to
be a *very* scary prospect, and I don't think the remediations that
were mentioned are going to fly. It's just too messy, and too much of
a niche use case. TBH I am skeptical of the idea that we can fix
exclusion constraints properly in this way at all, at least not
without the exponential backoff thing, which just seems horrible.


The idea of comparing the TIDs of the tuples as a tie-breaker seems most 
promising to me. If the conflicting tuple's TID is smaller than the 
inserted tuple's, super-delete and wait. Otherwise, wait without 
super-deletion. That's really simple. Do you see a problem with that?



Although you understood what I was on about when I first talked about
unprincipled deadlocks, I think that acceptance of that idea came
much later from other people, because it's damn complicated.


BTW, it would good to explain somewhere in comments or a README the term 
unprincipled deadlock. It's been a useful concept, and hard to grasp. 
If you defined it once, with examples and everything, then you could 
just have See .../README in other places that need to refer it.



It's not worth it to add
some weird Dining philosophers exponential backoff thing to make
sure that the IGNORE variant when used with exclusion constraints can
never deadlock in an unprincipled way, but if it is worth it then it
seems unreasonable to suppose that this patch needs to solve that
pre-existing problem. No?


The point of solving the existing problem is that it allows us to split 
the patch, for easier review.



Are we happy with acquiring the SpeculativeInsertLock heavy-weight lock for
every insertion? That seems bad for performance reasons. Also, are we happy
with adding the new fields to the proc array? Another alternative that was
discussed was storing the speculative insertion token on the heap tuple
itself. (See
http://www.postgresql.org/message-id/52d00d2d.6030...@vmware.com)


Whatever works, really. I can't say that the performance implications
of acquiring that hwlock are at the forefront of my mind. I never
found that to be a big problem on an 8 core box, relative to vanilla
INSERTs, FWIW - lock contention is not normal, and may be where any
heavweight lock costs would really be encountered.


Oh, cool. I guess the fast-path in lmgr.c kicks ass, then :-).


I don't see how storing the promise token in heap tuples buys us not
having to involve heavyweight locking of tokens. (I think you may have
made a thinko in suggesting otherwise)


It wouldn't get rid of heavyweight locks, but it would allow getting rid 
of the procarray changes. The inserter could generate a token, then 
acquire the hw-lock on the token, and lastly insert the heap tuple with 
the correct token.



Are we happy with the way super-deletion works? Currently, the xmin field is
set to invalid to mark the tuple as super-deleted. That required checks in
HeapTupleSatisfies* functions. One alternative would be to set xmax equal to
xmin, and teach the code currently calls XactLockTableWait() to check if
xmax=xmin, and not consider the tuple as conflicting.


That couldn't work without further HeapTupleSatisfiesDirty() logic.


Why not?


Besides, why should one transaction be entitled to insert a
conflicting value tuple just because a still running transaction
deleted it (having also inserted the tuple itself)? Didn't one
prototype version of value locking #2 have that as a bug [1]? In fact,
originally, didn't the xmin set to invalid thing come immediately
from realizing that that wasn't workable?


Ah, right. So the problem was that some code might assume that if you 
insert a row, delete it in the same transaction, and then insert the 
same value again, the 2nd insertion will always succeed because the 
previous insertion effectively acquired a 

Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0

2015-02-15 Thread Peter Geoghegan
On Sat, Feb 14, 2015 at 2:06 PM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Thanks for taking a look at it. That's somewhat cleaned up in the
 attached patchseries - V2.2. This has been rebased to repair the minor
 bit-rot pointed out by Thom.

 I don't really have the energy to review this patch in one batch, so I'd
 really like to split this into two:

I think we're all feeling worn out at this point, by this patch and by
others. I do appreciate your making the effort.

 1. Solve the existing problem with exclusion constraints that if two
 insertions happen concurrently, one of them might be aborted with deadlock
 detected error, rather then conflicting key violation error. That really
 wouldn't be worth fixing otherwise, but it happens to be a useful stepping
 stone for this upsert patch.

 2. All the rest.

I think we need a more pragmatic approach to dealing with the
exclusion constraint problems.

 I took a stab at extracting the parts needed to do 1. See attached. I didn't
 modify ExecUpdate to use speculative insertions, so the issue remains for
 UPDATEs, but that's easy to add.

Cool.

 I did not solve the potential for livelocks (discussed at
 http://www.postgresql.org/message-id/cam3swztftt_fehet3tu3ykcpcypynnauquz3q+naasnh-60...@mail.gmail.com).
 The patch always super-deletes the already-inserted tuple on conflict, and
 then waits for the other inserter. That would be nice to fix, using one of
 the ideas from that thread, or something else.

How about we don't super-delete at all with exclusion constraints? I'd
be willing to accept unprincipled deadlocks for exclusion constraints,
because they already exist today for UPSERT/NOSERT type use cases, and
with idiomatic usage seem far less likely for the IGNORE variant
(especially with exclusion constraints). I can see people using ON
CONFLICT UPDATE where they'd almost or actually be better off using a
plain UPDATE - that's quite a different situation. I find livelocks to
be a *very* scary prospect, and I don't think the remediations that
were mentioned are going to fly. It's just too messy, and too much of
a niche use case. TBH I am skeptical of the idea that we can fix
exclusion constraints properly in this way at all, at least not
without the exponential backoff thing, which just seems horrible.

 We never really debated the options for how to do the speculative insertion
 and super-deletion. This patch is still based on the quick  dirty demo
 patches I posted about a year ago, in response to issues you found with
 earlier versions. That might be OK - maybe I really hit the nail on
 designing those things and got them right on first try - but more likely
 there are better alternatives.

Intuitively, it seem likely that you're right here. However, it was
necessary to work through the approach to see what the problems are.
For example, the need for modifications to tqual.c became apparent
only through putting a full implementation of ON CONFLICT UPDATE
through various tests. In general, I've emphasized that the problems
with any given value locking implementation are non-obvious. Anyone
who thinks that he can see all the problems with his approach to value
locking without having a real implementation that is tested for
problems like unprincipled deadlocks is probably wrong.

That's sort of where I'm coming from with suggesting we allow
unprincipled deadlocks with exclusion constraints. I'm not confident
that we can find a perfect solution, and know that it's a perfect
solution. It's too odd, and too niche a requirement. Although you
understood what I was on about when I first talked about unprincipled
deadlocks, I think that acceptance of that idea came much later from
other people, because it's damn complicated. It's not worth it to add
some weird Dining philosophers exponential backoff thing to make
sure that the IGNORE variant when used with exclusion constraints can
never deadlock in an unprincipled way, but if it is worth it then it
seems unreasonable to suppose that this patch needs to solve that
pre-existing problem. No?

If we do something like an exponential backoff, which I think might
work, I fear that that kind of yak-shaving will leave us with
something impossible to justify committing; a total mess. Better the
devil you know (possible unprincipled deadlocks with the IGNORE
variant + exclusion constraints).

 Are we happy with acquiring the SpeculativeInsertLock heavy-weight lock for
 every insertion? That seems bad for performance reasons. Also, are we happy
 with adding the new fields to the proc array? Another alternative that was
 discussed was storing the speculative insertion token on the heap tuple
 itself. (See
 http://www.postgresql.org/message-id/52d00d2d.6030...@vmware.com)

Whatever works, really. I can't say that the performance implications
of acquiring that hwlock are at the forefront of my mind. I never
found that to be a big problem on an 8 core box, relative to vanilla
INSERTs, FWIW - lock contention is not 

Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0

2015-02-15 Thread Peter Geoghegan
On Fri, Feb 13, 2015 at 7:22 PM, Jim Nasby jim.na...@bluetreble.com wrote:
 In patch 1, sepgsql is also affected by this commit.  Note that this commit
 necessitates an initdb, since stored ACLs are broken.

 Doesn't that warrant bumping catversion?

Yes, but traditionally that is left until the last minute - when the
patch is committed. That's why it's called out in the commit message
(it isn't otherwise obvious - it's not a common catversion
necessitating change like an addition to a system catalog).

 Patch 2
 + * When killing a speculatively-inserted tuple, we set xmin to invalid
 and
 +if (!(xlrec-flags  XLOG_HEAP_KILLED_SPECULATIVE_TUPLE))

 When doing this, should we also set the HEAP_XMIN_INVALID hint bit?

 reads more of patch

 Ok, I see we're not doing this because the check for a super deleted tuple
 is already cheap. Probably worth mentioning that in the comment...

See my remarks to Heikki on this. I don't think it adds much.

 ExecInsert():
 + * We don't suppress the effects (or, perhaps, side-effects) of
 + * BEFORE ROW INSERT triggers.  This isn't ideal, but then we
 + * cannot proceed with even considering uniqueness violations until
 + * these triggers fire on the one hand, but on the other hand they
 + * have the ability to execute arbitrary user-defined code which
 + * may perform operations entirely outside the system's ability to
 + * nullify.

 I'm a bit confused as to why we're calling BEFORE triggers out here...
 hasn't this always been true for both BEFORE *and* AFTER triggers? The
 comment makes me wonder if there's some magic that happens with AFTER...

Yes, but the difference here is that the UPDATE path could be taken
(which is sort of like when a before row insert path returns NULL).
What I'm calling out is the dependency on firing before row insert
triggers to decide if the alternative path must be taken. Roughly
speaking, we must perform part of the INSERT (firing of before row
insert triggers) in order to decide to do or not do an INSERT. This
is, as I said, similar to when those triggers return NULL, and won't
matter with idiomatic patterns of before trigger usage. Still feels
worth calling out, because sometimes users do foolishly write before
triggers with many external side-effects.

 ExecLockUpdateTuple():
 + * Try to lock tuple for update as part of speculative insertion.  If
 + * a qual originating from ON CONFLICT UPDATE is satisfied, update
 + * (but still lock row, even though it may not satisfy estate's
 + * snapshot).

 I find this confusing... which row is being locked? The previously inserted
 one? Perhaps a better wording would be satisfied, update. Lock the original
 row even if it doesn't satisfy estate's snapshot.

Take a look at the executor README. We're locking the only row that
can be locked when an UPSERT non-conclusively thinks to take the
UPDATE path: the row that was found during our pre-check. We can only
UPDATE when we find such a tuple, and then lock it without finding a
row-level conflict.

 infer_unique_index():
 +/*
 + * We need not lock the relation since it was already locked, either by
 + * the rewriter or when expand_inherited_rtentry() added it to the query's
 + * rangetable.
 + */
 +relationObjectId = rt_fetch(parse-resultRelation, parse-rtable)-relid;
 +
 +relation = heap_open(relationObjectId, NoLock);

 Seems like there should be an Assert here. Also, the comment should probably
 go before the heap_open call.

An Assert() of what? Note that the similar function
get_relation_info() does about the same thing here.

 heapam_xlog.h:
 +/* reuse xl_heap_multi_insert-only bit for xl_heap_delete */
 I wish this would mention why it's safe to do this. Also, the comment
 mentions xl_heap_delete when the #define is for
 XLOG_HEAP_KILLED_SPECULATIVE_TUPLE; presumably that's wrong. Perhaps:
 /*
  * reuse XLOG_HEAP_LAST_MULTI_INSERT bit for
  * XLOG_HEAP_KILLED_SPECULATIVE_TUPLE. This is safe because we never do
  * multi-inserts for INSERT ON CONFLICT.
  */

It's safe, as the comment indicates, because the former is only used
for xl_heap_multi_insert records, while the latter is only used for
xl_heap_delete records. There is no ambiguity, because naturally we're
always able to establish what type of record is under consideration
before checking the bit is set.

The XLOG_HEAP_* format is used for other flags there, despite the fact
that other flags (like XLOG_HEAP_PREFIX_FROM_OLD) can only appear in
certain context-appropriate xl_heap_* records. AFAICT, all that I've
done that's new here is rely on that, since a bunch of those bits were
used up in the last year or two, and the need to even consider bit
reuse here is a new problem.

 I'll review the remaining patches later.

Thanks.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0

2015-02-14 Thread Heikki Linnakangas

On 02/10/2015 02:21 AM, Peter Geoghegan wrote:

On Fri, Feb 6, 2015 at 1:51 PM, Bruce Momjian br...@momjian.us wrote:

Other than the locking part, the biggest part of this patch is adjusting
things so that an INSERT can change into an UPDATE.


Thanks for taking a look at it. That's somewhat cleaned up in the
attached patchseries - V2.2. This has been rebased to repair the minor
bit-rot pointed out by Thom.


I don't really have the energy to review this patch in one batch, so I'd 
really like to split this into two:


1. Solve the existing problem with exclusion constraints that if two 
insertions happen concurrently, one of them might be aborted with 
deadlock detected error, rather then conflicting key violation 
error. That really wouldn't be worth fixing otherwise, but it happens to 
be a useful stepping stone for this upsert patch.


2. All the rest.

I took a stab at extracting the parts needed to do 1. See attached. I 
didn't modify ExecUpdate to use speculative insertions, so the issue 
remains for UPDATEs, but that's easy to add.


I did not solve the potential for livelocks (discussed at 
http://www.postgresql.org/message-id/cam3swztftt_fehet3tu3ykcpcypynnauquz3q+naasnh-60...@mail.gmail.com). 
The patch always super-deletes the already-inserted tuple on conflict, 
and then waits for the other inserter. That would be nice to fix, using 
one of the ideas from that thread, or something else.


We never really debated the options for how to do the speculative 
insertion and super-deletion. This patch is still based on the quick  
dirty demo patches I posted about a year ago, in response to issues you 
found with earlier versions. That might be OK - maybe I really hit the 
nail on designing those things and got them right on first try - but 
more likely there are better alternatives.


Are we happy with acquiring the SpeculativeInsertLock heavy-weight lock 
for every insertion? That seems bad for performance reasons. Also, are 
we happy with adding the new fields to the proc array? Another 
alternative that was discussed was storing the speculative insertion 
token on the heap tuple itself. (See 
http://www.postgresql.org/message-id/52d00d2d.6030...@vmware.com)


Are we happy with the way super-deletion works? Currently, the xmin 
field is set to invalid to mark the tuple as super-deleted. That 
required checks in HeapTupleSatisfies* functions. One alternative would 
be to set xmax equal to xmin, and teach the code currently calls 
XactLockTableWait() to check if xmax=xmin, and not consider the tuple as 
conflicting.


- Heikki
diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 46060bc1..0aa3e575 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2048,6 +2048,9 @@ FreeBulkInsertState(BulkInsertState bistate)
  * This causes rows to be frozen, which is an MVCC violation and
  * requires explicit options chosen by user.
  *
+ * If HEAP_INSERT_SPECULATIVE is specified, the MyProc-specInsert fields
+ * are filled.
+ *
  * Note that these options will be applied when inserting into the heap's
  * TOAST table, too, if the tuple requires any out-of-line data.
  *
@@ -2196,6 +2199,13 @@ heap_insert(Relation relation, HeapTuple tup, CommandId cid,
 
 	END_CRIT_SECTION();
 
+	/*
+	 * Let others know that we speculatively inserted this tuple, before
+	 * releasing the buffer lock.
+	 */
+	if (options  HEAP_INSERT_SPECULATIVE)
+		SetSpeculativeInsertionTid(relation-rd_node, heaptup-t_self);
+
 	UnlockReleaseBuffer(buffer);
 	if (vmbuffer != InvalidBuffer)
 		ReleaseBuffer(vmbuffer);
@@ -2616,11 +2626,17 @@ xmax_infomask_changed(uint16 new_infomask, uint16 old_infomask)
  * (the last only for HeapTupleSelfUpdated, since we
  * cannot obtain cmax from a combocid generated by another transaction).
  * See comments for struct HeapUpdateFailureData for additional info.
+ *
+ * If 'killspeculative' is true, caller requires that we super-delete a tuple
+ * we just inserted in the same command. Instead of the normal visibility
+ * checks, we check that the tuple was inserted by the current transaction and
+ * given command id.  Also, instead of setting its xmax, we set xmin to
+ * invalid, making it immediately appear as dead to everyone.
  */
 HTSU_Result
 heap_delete(Relation relation, ItemPointer tid,
 			CommandId cid, Snapshot crosscheck, bool wait,
-			HeapUpdateFailureData *hufd)
+			HeapUpdateFailureData *hufd, bool killspeculative)
 {
 	HTSU_Result result;
 	TransactionId xid = GetCurrentTransactionId();
@@ -2678,7 +2694,18 @@ heap_delete(Relation relation, ItemPointer tid,
 	tp.t_self = *tid;
 
 l1:
-	result = HeapTupleSatisfiesUpdate(tp, cid, buffer);
+	if (!killspeculative)
+	{
+		result = HeapTupleSatisfiesUpdate(tp, cid, buffer);
+	}
+	else
+	{
+		if (tp.t_data-t_choice.t_heap.t_xmin != xid ||
+			tp.t_data-t_choice.t_heap.t_field3.t_cid != cid)
+			elog(ERROR, attempted to super-delete a tuple from other CID);
+		result = 

Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0

2015-02-13 Thread Michael Paquier
On Tue, Feb 10, 2015 at 9:21 AM, Peter Geoghegan p...@heroku.com wrote:

 * There was some squashing of commits, since Andres felt that they
 weren't all useful as separate commits. I've still split out the RTE
 permissions commit, as well as the RLS commit (plus the documentation
 and test commits, FWIW). I hope that this will make it easier to
 review parts of the patch, without being generally excessive.

 When documentation and tests are left out, the entire patch series is left
 at:


Patch moved to next CF.
-- 
Michael


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0

2015-02-13 Thread Jim Nasby

On 2/9/15 6:21 PM, Peter Geoghegan wrote:

Thanks for taking a look at it. That's somewhat cleaned up in the
attached patchseries - V2.2.


In patch 1, sepgsql is also affected by this commit.  Note that this commit
necessitates an initdb, since stored ACLs are broken.

Doesn't that warrant bumping catversion?

Patch 2
+ * When killing a speculatively-inserted tuple, we set xmin to invalid
and
+if (!(xlrec-flags  XLOG_HEAP_KILLED_SPECULATIVE_TUPLE))

When doing this, should we also set the HEAP_XMIN_INVALID hint bit?

reads more of patch

Ok, I see we're not doing this because the check for a super deleted 
tuple is already cheap. Probably worth mentioning that in the comment...


ExecInsert():
+ * We don't suppress the effects (or, perhaps, side-effects) of
+ * BEFORE ROW INSERT triggers.  This isn't ideal, but then we
+ * cannot proceed with even considering uniqueness violations until
+ * these triggers fire on the one hand, but on the other hand they
+ * have the ability to execute arbitrary user-defined code which
+ * may perform operations entirely outside the system's ability to
+ * nullify.

I'm a bit confused as to why we're calling BEFORE triggers out here... 
hasn't this always been true for both BEFORE *and* AFTER triggers? The 
comment makes me wonder if there's some magic that happens with AFTER...


+spec != SPEC_NONE? HEAP_INSERT_SPECULATIVE:0
Non-standard formatting. Given the size of the patch and work already 
into it I'd just leave it for the next formatting run; I only mention it 
in case someone has some compelling reason not to.


ExecLockUpdateTuple():
+ * Try to lock tuple for update as part of speculative insertion.  If
+ * a qual originating from ON CONFLICT UPDATE is satisfied, update
+ * (but still lock row, even though it may not satisfy estate's
+ * snapshot).

I find this confusing... which row is being locked? The previously 
inserted one? Perhaps a better wording would be satisfied, update. Lock 
the original row even if it doesn't satisfy estate's snapshot.


infer_unique_index():
+/*
+ * We need not lock the relation since it was already locked, either by
+ * the rewriter or when expand_inherited_rtentry() added it to the query's
+ * rangetable.
+ */
+relationObjectId = rt_fetch(parse-resultRelation, parse-rtable)-relid;
+
+relation = heap_open(relationObjectId, NoLock);

Seems like there should be an Assert here. Also, the comment should 
probably go before the heap_open call.


heapam_xlog.h:
+/* reuse xl_heap_multi_insert-only bit for xl_heap_delete */
I wish this would mention why it's safe to do this. Also, the comment 
mentions xl_heap_delete when the #define is for 
XLOG_HEAP_KILLED_SPECULATIVE_TUPLE; presumably that's wrong. Perhaps:

/*
 * reuse XLOG_HEAP_LAST_MULTI_INSERT bit for
 * XLOG_HEAP_KILLED_SPECULATIVE_TUPLE. This is safe because we never do
 * multi-inserts for INSERT ON CONFLICT.
 */

I'll review the remaining patches later.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0

2015-02-10 Thread Andres Freund
On 2015-02-04 16:49:46 -0800, Peter Geoghegan wrote:
 On Tue, Feb 2, 2015 at 01:06 AM, Andres Freund and...@2ndquadrant.com wrote:
  Generally the split into the individual commits doesn't seem to make
  much sense to me.

I think trying to make that possible is a good idea in patches of this
size. It e.g. seems entirely possible to structure the patchset so that
the speculative lock infrastructure is added first and the rest
later. I've not thought more about how to split it up further, but I'm
pretty sure it's possible.

  The commits individually (except the first) aren't
  indivdiually commitable and aren't even meaningful. Splitting off the
  internal docs, tests and such actually just seems to make reviewing
  harder because you miss context. Splitting it so that individual piece
  are committable and reviewable makes sense, but... I have no problem
  doing the user docs later. If you split of RLS support, you need to
  throw an error before it's implemented.

 I mostly agree. Basically, I did not intend for all of the patches to
 be individually committed. The mechanism by which EXCLUDED.*
 expressions are added is somewhat novel, and deserves to be
 independently *considered*. I'm trying to show how the parts fit
 together more so than breaking things down in to smaller commits (as
 you picked up on, 0001 is the exception - that is genuinely intended
 to be committed early). Also, those commit messages give me the
 opportunity to put those parts in their appropriate context vis-a-vis
 our discussions. They refer to the Wiki, for example, or reasons why
 pg_stat_statements shouldn't care about ExcludedExpr. Obviously the
 final commit messages won't look that way.

FWIW, I don't think anything here really should refer to the wiki...

  0002:
  * Tentatively I'd say that killspeculative should be done via a separate
function instead of heap_delete()

 Really? I guess if that were to happen, it would entail refactoring
 heap_delete() to call a static function, which was also called by a
 new kill_speculative() function that does this. Otherwise, you'd have
 far too much duplication.

I don't really think there actually is that much common inbetween
those. It looks to me that most of the code in heap_delete isn't
actually relevant for this case and could be cut short. My guess is that
only the WAL logging would be separated out.

  * I doubt logical decoding works with the patch as it stands.

 I thought so. Perhaps you could suggest a better use of the available
 XLOG_HEAP_* bits. I knew I needed to consider that more carefully
 (hence the XXX comment), but didn't get around to it.

I think you probably need to add test macros that make sure only the
individual bits are sets, and not the combination and then only use those.

  * If a arbiter index is passed to ExecCheckIndexConstraints(), can't we
abort the loop after checking it? Also, do we really have to iterate
over indexes for that case? How about moving the loop contents to a
separate function and using that separately for the arbiter cases?

 Well, the failure to do that implies very few extra cycles, but sure.

It's not that much about the CPU cycles, but also about the mental
ones. If you have to think what happens if there's more than one
match...

  * ItemPointerIsValid

 What about it?

Uh. Oh. No idea. I wrote this pretty late at night ;)


  * /*
 * This may occur when an instantaneously invisible tuple is blamed
 * as a conflict because multiple rows are inserted with the same
 * constrained values.
 How can this happen? We don't insert multiple rows with the same
 command id?

 This is a cardinality violation [1]. It can definitely happen - just
 try the examples you see on the Wiki.

I don't care about the wiki from the point of code comments. This needs
to be understandable in five years time.

  * Perhaps it has previously been discussed but I'm not convinced by the
reasoning for not looking at opclasses in infer_unique_index(). This
seems like it'd prohibit ever having e.g. case insensitive opclasses -
something surely worthwile.

 I don't think anyone gave that idea the thumbs-up. However, I really
 don't see the problem. Sure, we could have case insensitive opclasses
 in the future, and you may want to make a unique index using one.

Then the problem suddenly becomes that previous choices of
indexes/statements aren't possible anymore. It seems much better to
introduce the syntax now and not have too much of a usecase for
it.


  * The whole speculative insert logic isn't really well documented. Why,
for example, do we actually need the token? And why are there no
issues with overflow? And where is it documented that a 0 means
there's no token? ...

 Fair enough. Presumably it's okay that overflow theoretically could
 occur, because a race is all but impossible. The token represents a
 particular attempt by some backend at inserting a tuple, that needs to
 be waited on 

Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0

2015-02-10 Thread Peter Geoghegan
On Tue, Feb 10, 2015 at 12:04 AM, Andres Freund and...@2ndquadrant.com wrote:
 FWIW, I don't think anything here really should refer to the wiki...

The Wiki pages have done a good job of summarizing things...it
certainly didn't seem that hard for you to get up to speed here. Also,
as you'll know from working on logical decoding, it's hard to know
what is complex and esoteric and what is relatively accessible when
you're this close to a big project. I recall that you said as much
before. I'm focused on signposting so that reviewers can follow what
each patch does with the minimum of effort (with reference to the Wiki
or whatever). I see no reason to not do things that way until
commit...it feels like there is less chance of the concepts going over
people's head this way.

 I don't really think there actually is that much common inbetween
 those. It looks to me that most of the code in heap_delete isn't
 actually relevant for this case and could be cut short. My guess is that
 only the WAL logging would be separated out.

I'll think about that some more.

  * I doubt logical decoding works with the patch as it stands.

 I thought so. Perhaps you could suggest a better use of the available
 XLOG_HEAP_* bits. I knew I needed to consider that more carefully
 (hence the XXX comment), but didn't get around to it.

 I think you probably need to add test macros that make sure only the
 individual bits are sets, and not the combination and then only use those.

This too.

  * /*
 * This may occur when an instantaneously invisible tuple is blamed
 * as a conflict because multiple rows are inserted with the same
 * constrained values.
 How can this happen? We don't insert multiple rows with the same
 command id?

 This is a cardinality violation [1]. It can definitely happen - just
 try the examples you see on the Wiki.

 I don't care about the wiki from the point of code comments. This needs
 to be understandable in five years time.

That wasn't clear before - you seemed to me to be questioning if this
was even possible. There is a section in the INSERT SQL reference page
about cardinality violations, so we're certainly talking about
something that a code reader likely heard of. Also, the nitty gritty
showing various scenarios on the Wiki is the quickest way to know what
is possible (but is much too long winded for user visible
documentation or code comments).

  * Perhaps it has previously been discussed but I'm not convinced by the
reasoning for not looking at opclasses in infer_unique_index(). This
seems like it'd prohibit ever having e.g. case insensitive opclasses -
something surely worthwile.

 I don't think anyone gave that idea the thumbs-up. However, I really
 don't see the problem. Sure, we could have case insensitive opclasses
 in the future, and you may want to make a unique index using one.

 Then the problem suddenly becomes that previous choices of
 indexes/statements aren't possible anymore. It seems much better to
 introduce the syntax now and not have too much of a usecase for
 it.

The only way the lack of a way of specifying which opclass to use
could bite us is if there were two *actually* defined unique indexes
on the same column, each with different equality operators. That
seems like kind of a funny scenario, even if that were quite possible
(even if non-default opclasses existed that had a non-identical
equality operators, which is basically not the case today).

I grant that is a bit odd that we're talking about unique indexes
definitions affecting semantics, but that is to a certain extent the
nature of the beast. As a compromise, I suggest having the inference
specification optionally accept a named opclass per attribute, in the
style of CREATE INDEX (I'm already reusing a bit of the raw parser
support for CREATE INDEX, you see) - that'll make inference insist on
that opclass, rather than make it a strict matter of costing available
alternatives (not that any alternative is expected with idiomatic
usage). That implies no additional parser overhead, really. If that's
considered ugly, then at least it's an ugly thing that literally no
one will ever use in the foreseeable future...and an ugly thing that
is no more necessary in CREATE INDEX than here (and yet CREATE INDEX
lives with the ugliness).

 It's really not about me understanding it right now, but about longer term.

Sure.

 Can you add a UPSERT test for logical decoding? I doubt it'll work right
 now, even in the repost.

Okay.

  * /*
  * Immediately VACUUM super-deleted tuples
  */
  if (TransactionIdEquals(HeapTupleHeaderGetRawXmin(tuple),
  InvalidTransactionId))
  return HEAPTUPLE_DEAD;
Is that branch really needed? Shouldn't it just be happening as a
consequence of the already existing code? Same in SatisfiesMVCC. If
you actually needed that block, it'd need to be done in SatisfiesSelf
as well, no? You have a comment about a possible loop - but that seems
wrong to me, 

Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0

2015-02-06 Thread Thom Brown
On 29 January 2015 at 23:38, Peter Geoghegan p...@heroku.com wrote:

 On Sat, Jan 17, 2015 at 6:48 PM, Peter Geoghegan p...@heroku.com wrote:
  I continued with this since posting V2.0.

 Attached version (V2.1) fixes bit-rot caused by the recent changes by
 Stephen (Fix column-privilege leak in error-message paths). More
 precisely, it is rebased on top of today's 17792b commit.


Patch 0002 no longer applies due to a conflict in
src/backend/executor/execUtils.c.

Thom


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0

2015-02-04 Thread Heikki Linnakangas

On 02/03/2015 08:17 PM, Peter Geoghegan wrote:

On Tue, Feb 3, 2015 at 2:05 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

TRAP: FailedAssertion(!(node-spec != SPEC_INSERT || node-arbiterIndex !=
((Oid) 0)), File: nodeModifyTable.c, Line: 1619)

Is that just because of the hack in parse_clause.c?


Yes. I never built with assertions and so didn't see this, but it
doesn't matter.


With assertions disabled, count_upsert_exclusion.pl ran successfully to the
end. I also tried running VACUUM FREEZE upsert_race_test in a loop in
another session at the same time, but it didn't make a difference. How
quickly do you see the errors?

I also tried applying crash_REL9_5.patch from the jjanes_upsert kit, and set
jj_xid=1 to increase XID burn rate, but I'm still not seeing any errors.


Did you build fully optimized, assertion-free code? I've been doing
that. I found it necessary to recreate some of the bugs Jeff's tool
caught. I also think that I might have needed an 8 core box to see it,
but less sure about that.


I had compiled with -O0, but without assertions. I tried now again with 
-O3. It's been running for about 10 minutes now, and I haven't seen any 
errors.


Since you can reproduce this, it would be good if you could debug this. 
The error message where the alleged duplicate key actually had a 
different value is a bit scary. Makes me wonder if it might be a bug 
with exclusion constraints in general, or just with the patch.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0

2015-02-04 Thread Peter Geoghegan
On Wed, Feb 4, 2015 at 9:54 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 I had compiled with -O0, but without assertions. I tried now again with -O3.
 It's been running for about 10 minutes now, and I haven't seen any errors.

Did you run with an artificially high XID burn rate (i.e. did you also
apply Jeff's modifications to Postgres, and specify a high burn rate
using his custom GUC)? Maybe that was important.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0

2015-02-04 Thread Peter Geoghegan
On Tue, Feb 2, 2015 at 01:06 AM, Andres Freund and...@2ndquadrant.com wrote:
 A first (not actually that quick :() look through the patches to see
 what actually happened in the last months. I didn't keep up with the
 thread.

So, let me get this out of the way: This is the first in-depth
technical review that this work has had in a long time. Thank you for
your help here.

 Generally the split into the individual commits doesn't seem to make
 much sense to me. The commits individually (except the first) aren't
 indivdiually commitable and aren't even meaningful. Splitting off the
 internal docs, tests and such actually just seems to make reviewing
 harder because you miss context. Splitting it so that individual piece
 are committable and reviewable makes sense, but... I have no problem
 doing the user docs later. If you split of RLS support, you need to
 throw an error before it's implemented.

I mostly agree. Basically, I did not intend for all of the patches to
be individually committed. The mechanism by which EXCLUDED.*
expressions are added is somewhat novel, and deserves to be
independently *considered*. I'm trying to show how the parts fit
together more so than breaking things down in to smaller commits (as
you picked up on, 0001 is the exception - that is genuinely intended
to be committed early). Also, those commit messages give me the
opportunity to put those parts in their appropriate context vis-a-vis
our discussions. They refer to the Wiki, for example, or reasons why
pg_stat_statements shouldn't care about ExcludedExpr. Obviously the
final commit messages won't look that way.

 0001:
 * References INSERT with ON CONFLICT UPDATE, can thus not be committed
   independently. I don't think those references really are needed.
 * I'm not a fan of the increased code duplication in
   ExecCheckRTEPerms(). Can you look into cleaning that up?
 * Also the comments inside the ACL_INSERT case still reference UPDATE.

 Other than that I think we can just go ahead and commit this ahead of
 time. Mentioning ON CONFLICT UPDATE (OCU henceforth) in the commit
 message only.

Cool. Attached revision makes those changes.

 0007:
 * AMs alone isn't particularly unique.
 * Without the context of the discussion unprincipled deadlocks aren't
   well defined.
 * Too many  words.
 * Waiting too long isn't defined. Neither is why that'd imply
   unprincipled deadlocks. Somewhat understandable with the context of
   the discussion, but surely not a couple years down the road.
 * As is I don't find the README entry super helpful. It should state
   what the reason for doing this is cleary, start at the higher level,
   and then move to the details.
 * Misses details about the speculative heavyweight locking of tuples.

Fair points. I'll work through that feedback.

Actually, I think we should memorialize that unprincipled deadlocks
should be avoided in some more general way, since they are after all a
general problem that we've seen elsewhere. I'm not sure about how to
go about doing that, though.

 0002:
 * Tentatively I'd say that killspeculative should be done via a separate
   function instead of heap_delete()

Really? I guess if that were to happen, it would entail refactoring
heap_delete() to call a static function, which was also called by a
new kill_speculative() function that does this. Otherwise, you'd have
far too much duplication.

 * I think we should, as you ponder in a comment, do the OCU specific
   stuff lazily and/or in a separate function from BuildIndexInfo(). That
   function is already quite visible in profiles, and the additional work
   isn't entirely trivial.

Okay.

 * I doubt logical decoding works with the patch as it stands.

I thought so. Perhaps you could suggest a better use of the available
XLOG_HEAP_* bits. I knew I needed to consider that more carefully
(hence the XXX comment), but didn't get around to it.

 * The added ereport (i.e. user facing) error message in
   ExecInsertIndexTuples won't help a user at all.

So, this:

/* Skip this index-update if the predicate isn't satisfied */
if (!ExecQual(predicate, econtext, false))
 + {
 + if (arbiterIdx == indexRelation-rd_index-indexrelid)
 + ereport(ERROR,
 +  (errcode(ERRCODE_TRIGGERED_ACTION_EXCEPTION),
 +   errmsg(partial arbiter unique index has predicate 
 that does not cover tuple proposed for insertion),
 +   errdetail(ON CONFLICT inference clause implies that 
 the tuple proposed for insertion actually be covered by partial predicate 
 for index \%s\.,
 + 
 RelationGetRelationName(indexRelation)),
 +   errhint(ON CONFLICT inference clause must infer a 
 unique index that covers the final tuple, after BEFORE ROW INSERT triggers 
 fire.),
 +   errtableconstraint(heapRelation,
 +
 RelationGetRelationName(indexRelation;
 

Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0

2015-02-04 Thread Peter Geoghegan
On Wed, Feb 4, 2015 at 10:03 AM, Peter Geoghegan p...@heroku.com wrote:
 On Wed, Feb 4, 2015 at 9:54 AM, Heikki Linnakangas
 hlinnakan...@vmware.com wrote:
 I had compiled with -O0, but without assertions. I tried now again with -O3.
 It's been running for about 10 minutes now, and I haven't seen any errors.

 Did you run with an artificially high XID burn rate (i.e. did you also
 apply Jeff's modifications to Postgres, and specify a high burn rate
 using his custom GUC)? Maybe that was important.

Excuse me: I now see that you specifically indicated that you did. But
looks like your XID burn rate was quite a lot higher than mine
(assuming that you were consistent in using  jj_xid=1, although
I'm not asserting that that was significant).

I attach a log of output from an example session where exclusion
constraints are shown to break (plus the corresponding server log,
plus /proc/cpuinfo on the off chance that that's significant). As you
can from the fact that the span of time recorded in the log is only a
couple of minutes, this is really easy for me to
recreatesometimes. I could not recreate the problem with only 4
clients (on this 8 core server) after a few dozen attempts, and then I
couldn't recreate the issue at all, so clearly those details matter.
It might have something to do with CPU scaling, which I've found can
significantly affect outcomes for things like this (looks like my
hosting provider changed settings in the system BIOS recently, such
that I cannot set the CPU governor to performance).

Perhaps you could take a crack at recreating this, Jeff?

Thanks
-- 
Peter Geoghegan
Example run of jjanes_upsert tool:

pg@gerbil:~/jjanes_upsert$ perl count_upsert_exclusion.pl 8
[Thu Feb  5 04:38:36 2015] NOTICE:  extension btree_gist already exists, 
skipping
[Thu Feb  5 04:38:36 2015] init done at count_upsert_exclusion.pl line 106.
pg@gerbil:~/jjanes_upsert$ perl count_upsert_exclusion.pl 8 1
[Thu Feb  5 04:38:56 2015] NOTICE:  extension btree_gist already exists, 
skipping
[Thu Feb  5 04:38:56 2015] init done at count_upsert_exclusion.pl line 106.
[Thu Feb  5 04:39:01 2015] sum is 528
[Thu Feb  5 04:39:01 2015] count is 8740
[Thu Feb  5 04:39:01 2015] For tuple with index value 1, 6 != 3 at 
count_upsert_exclusion.pl line 174.
pg@gerbil:~/jjanes_upsert$ perl count_upsert_exclusion.pl 8 1
[Thu Feb  5 04:39:08 2015] NOTICE:  extension btree_gist already exists, 
skipping
[Thu Feb  5 04:39:08 2015] init done at count_upsert_exclusion.pl line 106.
pg@gerbil:~/jjanes_upsert$ perl count_upsert_exclusion.pl 8 1
[Thu Feb  5 04:39:20 2015] NOTICE:  extension btree_gist already exists, 
skipping
[Thu Feb  5 04:39:20 2015] init done at count_upsert_exclusion.pl line 106.

Corresponding PostgreSQL server logs (note: This uses the PDT timezone, not UTC 
as above):

pg@gerbil:~/postgresql$ pg_ctl start -o --fsync=off --JJ_xid=50 --JJ_vac=1 -w 
21 | tee postgres.log
waiting for server to start31541 2015-02-04 19:36:02 PST LOG:  database 
system was shut down at 2015-02-04 19:35:12 PST
31541 2015-02-04 19:36:02 PST LOG:  JJ transaction ID wrap limit is 2147484359, 
limited by database with OID 1
31540 2015-02-04 19:36:02 PST LOG:  database system is ready to accept 
connections
 done
server started
31706 2015-02-04 19:38:42 PST LOG:  duration: 161.921 ms  statement: insert 
into upsert_race_test (index, count) values ('4751','1') on conflict
  update set count=TARGET.count + EXCLUDED.count
  where TARGET.index = EXCLUDED.index
  returning upsert_race_test.count
31697 2015-02-04 19:38:42 PST LOG:  duration: 161.900 ms  statement: insert 
into upsert_race_test (index, count) values ('6931','-1') on conflict
  update set count=TARGET.count + EXCLUDED.count
  where TARGET.index = EXCLUDED.index
  returning upsert_race_test.count
31695 2015-02-04 19:38:42 PST LOG:  duration: 161.966 ms  statement: insert 
into upsert_race_test (index, count) values ('5679','-1') on conflict
  update set count=TARGET.count + EXCLUDED.count
  where TARGET.index = EXCLUDED.index
  returning upsert_race_test.count
31707 2015-02-04 19:38:42 PST LOG:  duration: 161.959 ms  statement: insert 
into upsert_race_test (index, count) values ('5876','1') on conflict
  update set count=TARGET.count + EXCLUDED.count
  where TARGET.index = EXCLUDED.index
  returning upsert_race_test.count
31705 2015-02-04 19:38:42 PST LOG:  duration: 162.009 ms  statement: insert 
into upsert_race_test (index, count) values ('1507','1') on conflict
  update set count=TARGET.count + EXCLUDED.count
  where TARGET.index = EXCLUDED.index
  returning upsert_race_test.count
31704 2015-02-04 19:38:42 PST LOG:  duration: 162.058 ms  statement: insert 
into 

Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0

2015-02-03 Thread Peter Geoghegan
On Tue, Feb 3, 2015 at 2:05 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 TRAP: FailedAssertion(!(node-spec != SPEC_INSERT || node-arbiterIndex !=
 ((Oid) 0)), File: nodeModifyTable.c, Line: 1619)

 Is that just because of the hack in parse_clause.c?

Yes. I never built with assertions and so didn't see this, but it
doesn't matter.

 With assertions disabled, count_upsert_exclusion.pl ran successfully to the
 end. I also tried running VACUUM FREEZE upsert_race_test in a loop in
 another session at the same time, but it didn't make a difference. How
 quickly do you see the errors?

 I also tried applying crash_REL9_5.patch from the jjanes_upsert kit, and set
 jj_xid=1 to increase XID burn rate, but I'm still not seeing any errors.

Did you build fully optimized, assertion-free code? I've been doing
that. I found it necessary to recreate some of the bugs Jeff's tool
caught. I also think that I might have needed an 8 core box to see it,
but less sure about that.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0

2015-02-03 Thread Heikki Linnakangas

On 01/30/2015 01:38 AM, Peter Geoghegan wrote:

On the stress-testing front, I'm still running Jeff Janes' tool [1],
while also continuing to use his Postgres modifications to
artificially increase the XID burn rate.

[1] https://github.com/petergeoghegan/jjanes_upsert


I followed the instructions in README.md to reproduce this. I downloaded 
the tool, applied the upsert patchset, applied the hack to 
parse_clause.c as instructed in the README.md file, installed 
btree_gist, and ran count_upsert_exclusion.pl.


It failed immediately with an assertion failure:

TRAP: FailedAssertion(!(node-spec != SPEC_INSERT || node-arbiterIndex 
!= ((Oid) 0)), File: nodeModifyTable.c, Line: 1619)


Is that just because of the hack in parse_clause.c?

With assertions disabled, count_upsert_exclusion.pl ran successfully to 
the end. I also tried running VACUUM FREEZE upsert_race_test in a loop 
in another session at the same time, but it didn't make a difference. 
How quickly do you see the errors?


I also tried applying crash_REL9_5.patch from the jjanes_upsert kit, and 
set jj_xid=1 to increase XID burn rate, but I'm still not seeing any 
errors.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0

2015-02-02 Thread Peter Geoghegan
On Mon, Feb 02, 2015 at 4:48 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 I think that the fundamental, unfixable race condition here is the
 disconnect between index tuple insertion and checking for would-be
 exclusion violations that exclusion constraints naturally have here,
 that unique indexes naturally don't have [1] (note that I'm talking
 only about approach #2 to value locking here; approach #1 isn't in
 V2.0). I suspect that the feature is not technically feasible to make
 work correctly with exclusion constraints, end of story. VACUUM
 interlocking is probably also involved here, but the unfixable race
 condition seems like our fundamental problem.

 It's not a fundamental, unfixable race condition. In [1], I gave you
 three ideas straight off the top of my head on how that could be fixed.

That was different - I tried to make it work by fixing some bugs
there. However, I'm now finding myself up against these new bugs. I
think that the underlying cause is the lack of any real locking
(unlike with the B-Tree AM) in *both* cases, but I don't even know
that for sure. The error messages you see are quite odd - why should a
btree_gist-based exclusion constraint cause a violation when
non-conflicting values are inserted? There is some other race
condition here. This wasn't a livelock (or a deadlock), which is what
your comments in early January apply to. I think that this has
something to do with VACUUM interlocking. But with the B-Tree AM
(which we're handling differently, by re-using infrastructure used for
deferred unique constraints), things work quite well. The patch stands
up to Jeff's vigorous stress-tests.

I'm not fundamentally in disagreement with you about any of this. All
I'm saying is that we should cut scope today. We're not precluding
picking up an IGNORE feature that does support exclusion constraints
in the future. Why should we insist upon having that in the first cut?
It's both significantly harder, and significantly less useful to
users, and so cutting that makes perfect sense AFAICT. As I've said
many times, exclusion constraint support was only ever going to be
useful to the IGNORE variant (I've tested exclusion constraints by
contriving a case to make them do UPSERTs, but this is only for the
benefit of the stress-test).

Thanks
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0

2015-02-02 Thread Geoff Winkless
On 30 January 2015 at 21:58, Peter Geoghegan p...@heroku.com wrote:
 On Fri, Jan 30, 2015 at 6:59 AM, Geoff Winkless pgsqlad...@geoff.dj wrote:
 I suppose there's no reason why we couldn't use a no-op ON CONFLICT
 UPDATE anyway

 Right. IGNORE isn't really all that compelling for that reason. Note
 that this will still lock the unmodified row, though.

Mmmf. So I would have to make sure that my source tuples were unique
before doing the INSERT (otherwise the first ON CONFLICT UPDATE for a
tuple would block any other)? That's potentially very slow :(

When you say that you can't add exclusion constraints later, do you
mean from a coding point of view or just because people would get
confused whether exclusion constraints could be IGNOREd or not?

Geoff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0

2015-02-02 Thread Geoff Winkless
On 2 February 2015 at 14:32, Geoff Winkless pgsqlad...@geoff.dj wrote:
 Mmmf. So I would have to make sure that my source tuples were unique
 before doing the INSERT (otherwise the first ON CONFLICT UPDATE for a
 tuple would block any other)? That's potentially very slow :(

Replying to my own message, because it occurs to me I might be being
stupid (surely not :) )

When you say this will still lock the unmodified row did you mean
just that it's locked to _other_ processes until commit? That would be
much less impactful.

Geoff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0

2015-02-02 Thread Heikki Linnakangas

On 01/18/2015 04:48 AM, Peter Geoghegan wrote:

I think that the fundamental, unfixable race condition here is the
disconnect between index tuple insertion and checking for would-be
exclusion violations that exclusion constraints naturally have here,
that unique indexes naturally don't have [1] (note that I'm talking
only about approach #2 to value locking here; approach #1 isn't in
V2.0). I suspect that the feature is not technically feasible to make
work correctly with exclusion constraints, end of story. VACUUM
interlocking is probably also involved here, but the unfixable race
condition seems like our fundamental problem.


It's not a fundamental, unfixable race condition. In [1], I gave you 
three ideas straight off the top of my head on how that could be fixed.



Please work with me towards a committable patch.


I'm trying...


[1] http://www.postgresql.org/message-id/54a7c76d.3070...@vmware.com


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0

2015-02-02 Thread Heikki Linnakangas

On 01/30/2015 01:38 AM, Peter Geoghegan wrote:

I have not addressed the recently described problems with exclusion
constraints. I hope we can do so shortly. Simply removing IGNORE
support until such time as we straighten that all out (9.6?) seems
like the simplest solution. No need to block the progress of UPSERT,
since exclusion constraint support was only ever going to be useful
for the less compelling IGNORE variant. What do other people think? Do
you agree with my view that we should shelve IGNORE support for now,
Heikki?


No, I don't agree. Let's fix it.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0

2015-02-01 Thread Andres Freund
Hi,

A first (not actually that quick :() look through the patches to see
what actually happened in the last months. I didn't keep up with the
thread.

Generally the split into the individual commits doesn't seem to make
much sense to me. The commits individually (except the first) aren't
indivdiually commitable and aren't even meaningful. Splitting off the
internal docs, tests and such actually just seems to make reviewing
harder because you miss context. Splitting it so that individual piece
are committable and reviewable makes sense, but... I have no problem
doing the user docs later. If you split of RLS support, you need to
throw an error before it's implemented.

0001:
* References INSERT with ON CONFLICT UPDATE, can thus not be committed
  independently. I don't think those references really are needed.
* I'm not a fan of the increased code duplication in
  ExecCheckRTEPerms(). Can you look into cleaning that up?
* Also the comments inside the ACL_INSERT case still reference UPDATE.

Other than that I think we can just go ahead and commit this ahead of
time. Mentioning ON CONFLICT UPDATE (OCU henceforth) in the commit
message only.

0007:
* AMs alone isn't particularly unique.
* Without the context of the discussion unprincipled deadlocks aren't
  well defined.
* Too many  words.
* Waiting too long isn't defined. Neither is why that'd imply
  unprincipled deadlocks. Somewhat understandable with the context of
  the discussion, but surely not a couple years down the road.
* As is I don't find the README entry super helpful. It should state
  what the reason for doing this is cleary, start at the higher level,
  and then move to the details.
* Misses details about the speculative heavyweight locking of tuples.

0002:
* Tentatively I'd say that killspeculative should be done via a separate
  function instead of heap_delete()
* I think we should, as you ponder in a comment, do the OCU specific
  stuff lazily and/or in a separate function from BuildIndexInfo(). That
  function is already quite visible in profiles, and the additional work
  isn't entirely trivial.
* I doubt logical decoding works with the patch as it stands.
* The added ereport (i.e. user facing) error message in
  ExecInsertIndexTuples won't help a user at all.
* Personally I don't care one iota for comments like Get information
  from the result relation info structure.. Yes, one of these already
  exists, but ...
* If a arbiter index is passed to ExecCheckIndexConstraints(), can't we
  abort the loop after checking it? Also, do we really have to iterate
  over indexes for that case? How about moving the loop contents to a
  separate function and using that separately for the arbiter cases?
* Don't like the comment above check_exclusion_or_unique_constraint()'s
  much. Makes too much of a special case of OCU
* ItemPointerIsValid
* ExecCheckHeapTupleVisible's comment It is not acceptable to proceed 
  sounds like you're talking with a child or so ;)
* ExecCheckHeapTupleVisible()'s errhint() sounds like an
  argument/excuse (actually like a code comment). That's not going to
  help a user at all.
* I find the modified control flow in ExecInsert() pretty darn ugly. I
  think this needs to be cleaned up. The speculative case should imo be
  a separate function or something.
* /*
   * This may occur when an instantaneously invisible tuple is blamed
   * as a conflict because multiple rows are inserted with the same
   * constrained values.
   How can this happen? We don't insert multiple rows with the same
   command id?
* ExecLockUpdatedTuple() has (too?) many comments, but little overview
  of what/why it is doing what it does on a higher level.

* plan_speculative_use_index: Use the planner to decide speculative
  insertion arbiter index - Huh?  rel is the target to undergo ON
  CONFLICT UPDATE/IGNORE. - Which rel?
* formulations as fundamental nexus are hard to understand imo.
* Perhaps it has previously been discussed but I'm not convinced by the
  reasoning for not looking at opclasses in infer_unique_index(). This
  seems like it'd prohibit ever having e.g. case insensitive opclasses -
  something surely worthwile.
* Doesn't infer_unique_index() have to look for indisvalid? This isn't
  going to work well with a invalid (not to speak for a !ready) index.
* Is -relation in the UpdateStmt generated in transformInsertStmt ever
  used for anything? If so, it'd possibly generate some possible
  nastyness due to repeated name lookups. Looks like it'll be used in
  transformUpdateStmt
* What's the reason for the !pstate-p_parent? Also why the parens?
pstate-p_is_speculative = (pstate-parentParseState 

(!pstate-p_parent_cte 
 
pstate-parentParseState-p_is_insert 
 
pstate-parentParseState-p_is_speculative));
* Why did you need to make %nonassoc  

Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0

2015-01-30 Thread Peter Geoghegan
On Fri, Jan 30, 2015 at 6:59 AM, Geoff Winkless pgsqlad...@geoff.dj wrote:
 I appreciate the work you're doing and (as a user rather than a
 pg-hacker) don't want to butt in but if it would be possible to allow
 support for IGNORE for unique but not exclusion constraints that would
 be really helpful for my own use cases, where being able to insert
 from a dataset into a table containing unique constraints without
 having to first check the dataset for uniqueness (within both itself
 and the target table) is very useful.

 It's possible that I've misunderstood anyway and that you mean purely
 that exclusion constraint IGNORE should be shelved until 9.6, in which
 case I apologise.

Well, the issue is that we can't really add exclusion constraints to
the IGNORE case later. So the fact that we cannot do exclusion
constraints kind of implies that we can either shelve IGNORE and maybe
look at it later, or accept that we'll never support exclusion
constraints with IGNORE. We'd then include IGNORE without exclusion
constraint support now and forever. I tend to think that we'll end up
doing the latter anyway, but I really don't want to add additional
risk of this not getting into 9.5 by arguing about that now. It
doesn't matter that much.

 I suppose there's no reason why we couldn't use a no-op ON CONFLICT
 UPDATE anyway

Right. IGNORE isn't really all that compelling for that reason. Note
that this will still lock the unmodified row, though.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0

2015-01-30 Thread Geoff Winkless
On Thu, Jan 29, 2015 at 11:38 PM, Peter Geoghegan pg(at)heroku(dot)com wrote:
 Simply removing IGNORE support until such time as we straighten
 that all out (9.6?) seems like the simplest solution. No need to block
 the progress of UPSERT, since exclusion constraint support was
 only ever going to be useful for the less compelling IGNORE variant.
 What do other people think? Do you agree with my view that we should
 shelve IGNORE support for now, Heikki?

I appreciate the work you're doing and (as a user rather than a
pg-hacker) don't want to butt in but if it would be possible to allow
support for IGNORE for unique but not exclusion constraints that would
be really helpful for my own use cases, where being able to insert
from a dataset into a table containing unique constraints without
having to first check the dataset for uniqueness (within both itself
and the target table) is very useful.

It's possible that I've misunderstood anyway and that you mean purely
that exclusion constraint IGNORE should be shelved until 9.6, in which
case I apologise.

Of course if there's no way to allow unique constraint IGNORE without
exclusion constraints then just ignore me; I (along I'm sure with all
the others who are following this conversation from afar) will be
incredibly grateful for the work you've done either way.

I suppose there's no reason why we couldn't use a no-op ON CONFLICT
UPDATE anyway, but that does seem rather messy and (I imagine) would
involve rather more work (unless the optimizer were able to optimize
away the update? I don't know enough to be able to say if it would).

Thanks

Geoff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0

2015-01-17 Thread Peter Geoghegan
On Sat, Jan 10, 2015 at 8:32 PM, Peter Geoghegan p...@heroku.com wrote:
 I also include various bugfixes to approach #2 to value locking (these
 were all previously separately posted, but are now integrated into the
 main ON CONFLICT commit). Specifically, these are fixes for the bugs
 that emerged thanks to Jeff Janes' great work on stress testing [4].
 With these fixes, I have been unable to reproduce any problem with
 this patch with the test suite, even after many days of running the
 script on a quad-core server, with constant concurrent VACUUM runs,
 etc.

I continued with this since posting V2.0. I've run this bash script,
that invokes Jeff's script at various client counts, with runs of
various duration (since each client does a fixed amount of work):

https://github.com/petergeoghegan/jjanes_upsert/blob/master/run_test.sh

As previously discussed, Jeff's script comprehensively verifies the
correctness of the final values of a few thousand rows within a table
after many concurrent upserts, within and across upserting sessions,
and with concurrent deletions, too.

When building Postgres for this stress test, I included Jeff's
modifications that increase the XID burn rate artificially (I chose a
burn rate of X50). This makes anti-wraparound VACUUMs much more
frequent. I'm also looking out for outlier query execution durations,
because in theory they could indicate an unknown lock starvation
problem. I haven't seen any notable outliers after over a week of
testing.

 I think that we still need to think about the issues that
 transpired with exclusion constraints, but since I couldn't find
 another problem with an adapted version of Jeff's tool that tested
 exclusion constraints, I'm inclined to think that it should be
 possible to support exclusion constraints for the IGNORE variant.

Exclusion constraints were my focus with stress testing today. I
performed equivalent verification of upserts using exclusion
constraints (this is a hack; exclusion constraints are only intended
to be used with the IGNORE variant, but I get better test coverage
than I might otherwise this way). Unfortunately, even with the recent
bugfixes, there are still problems. On this server (rather than my
laptop), with 8 clients I can see errors like this before too long
(note that this output includes custom instrumentation from Jeff):


6670 2015-01-17 18:02:54 PST LOG:  JJ scan_all 1, relfrozenid -813636509
6670 2015-01-17 18:02:54 PST LOG:  JJ freezeLimit -661025537
6670 2015-01-17 18:02:54 PST LOG:  JJ freeze_min_age 5000
vacuum_freeze_table_age 15000 freeze_table_age 15000 ReadNew
-611025384
6670 2015-01-17 18:02:54 PST LOG:  JJ scan_all 1, relfrozenid -813636101
6670 2015-01-17 18:02:54 PST LOG:  JJ transaction ID wrap limit is
1352632427, limited by database with OID 12746
6670 2015-01-17 18:02:54 PST LOG:  autovacuum: done processing
database postgres at recent Xid of 3683945176 recent mxid of 1
6668 2015-01-17 18:02:54 PST ERROR:  conflicting key value violates
exclusion constraint upsert_race_test_index_excl
6668 2015-01-17 18:02:54 PST DETAIL:  Key (index)=(7142) conflicts
with existing key (index)=(600).
6668 2015-01-17 18:02:54 PST STATEMENT:  insert into upsert_race_test
(index, count) values ('7142','1') on conflict
  update set count=TARGET.count + EXCLUDED.count
  where TARGET.index = EXCLUDED.index
  returning upsert_race_test.count


It's always an exclusion violation problem that I see here.

As you can see, the query involved has no unique index inference
specification, per the hack to make this work with exclusion
constraints (the artificially much greater XID burn rate might have
also increased the likelihood of this error dramatically). You'll also
note that the DETAIL message seems to indicate that this
btree_gist-based exclusion constraint doesn't behave like a unique
constraint at all, because the conflicting new value (7142) is not at
all the same as the existing value (600). But that's wrong -- it's
supposed to be B-Tree-like. In short, there are further race
conditions with exclusion constraints.

I think that the fundamental, unfixable race condition here is the
disconnect between index tuple insertion and checking for would-be
exclusion violations that exclusion constraints naturally have here,
that unique indexes naturally don't have [1] (note that I'm talking
only about approach #2 to value locking here; approach #1 isn't in
V2.0). I suspect that the feature is not technically feasible to make
work correctly with exclusion constraints, end of story. VACUUM
interlocking is probably also involved here, but the unfixable race
condition seems like our fundamental problem.

We could possibly spend a lot of time discussing whether or not I'm
right about it being inherently impossible to make INSERT ... ON
CONFLICT IGNORE work with exclusion constraints. However, I strongly
suggest that we cut scope and at least leave them out of