[HACKERS] Re: Problems with approach #2 to value locking (INSERT ... ON CONFLICT UPDATE/IGNORE patch)

2015-02-02 Thread Heikki Linnakangas

On 01/03/2015 10:42 PM, Peter Geoghegan wrote:

On Sat, Jan 3, 2015 at 2:41 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:

A-ha, I see. And this can happen without INSERT ON CONFLICT, too? In that
case, one of the transactions is bound to error and roll back anyway, but
you get a deadlock error instead of the constraint violation error, which is
not as nice.


Agreed.

I haven't experimentally verified that this can happen with exclusion
constraints without the ON CONFLICT patch being involved, but I would
be very surprised if it didn't. How could it possibly not happen? It's
not all that bad since, as you say, one or the other xact was going to
be aborted anyway. Since the insertions would have to occur at exactly
the same time, even if one backend were to get an exclusion violation
rather than being killed by the deadlock detector, the choice of which
backend to raise an error within would be essentially random anyway.


Yep. I just tested this, and I can confirm that it does happen with 
vanilla exclusion constraints. If two backends insert the same value at 
the same time, you usually get the error conflicting key value violates 
exclusion constraint, but if the timing is right, you get deadlock 
detected instead.


Let's focus on fixing that case first. I wouldn't care otherwise, but it 
allows us to work on the locking, the super-deletion etc. without all 
the rest of the patch. That will provide you a way to split the patch 
into two: 1. Avoid deadlock errors with regular exclusion constraints, 
with all the locking etc. that's needed to solve that, and 2. Upsert.



1. On conflict, mark the inserted tuple as killed, and retry. But before
retrying, acquire a new kind of lock on the table, let's call it
SpeculativeInsertionLock. This fixes the deadlock, by retrying instead of
sleeping, and avoids the livelock because the new lock ensures that only one
backend retries at a time.


We super delete the tuple on retry already. However, we wait for the
other xact with our broken promise tuple still physically inserted
into the heap. We can't super delete the tuple before
XactLockTableWait()/SpeculativeInsertionWait() lock acquisition,
because doing so risks livelock. I think you already agree with me up
to here.

However, if we're not sleep waiting on the other xact (rather, we're
retrying [with a new class of exclusive table-level lock] instead of
sleeping), why should our xact be able to do useful work on retry?
Why won't it just spin/busy wait? More to the point, why wouldn't it
deadlock when it is obliged to wait on a third inserter's tuple?
AFAICT, you've just moved the problem around, because now we're
obliged to get a shared lock on the xid or speculative token
(XactLockTableWait()/SpeculativeInsertionWait()) of a session that
itself wants this new table level lock that only we have.


Sorry, I was very terse in my previous explanation. Let me try again. 
Let's begin with a simpler, poor-performance version of the scheme:


1. Acquire the new SpeculativeInsertionLock on the table
2. Insert tuple to heap and index.
3. Scan the index to see if there is any other conflicting tuple. (If 
there isn't, or the conflicting tuple is already committed, we're done)

4. Super-delete the tuple we already inserted
5. Release SpeculativeInsertionLock
6. XactLockTableWait() on the other guy

Note that we don't try to acquire any other locks while holding 
SpeculativeInsertionLock.


Compare this with the way unique-checks in b-tree work today. The B-tree 
check is free of race conditions because we hold the lock on the b-tree 
page while we check the visibility of the page, and we don't insert the 
index tuple if we have to wait. The SpeculativeInsertionLock 
accomplishes the same. It makes steps 3 and 4 atomic.


Compared to your patch as it stands, this fixes the deadlock because 
when A inserts a tuple and scans the index, any conflicting tuples it 
finds must have completed the insertion before A. The other inserters 
won't later try to wait on the tuple that A inserts.


We could do the above without the new lock, but as you've said, that 
would risk a livelock. Two concurrent inserters might repeatedly insert, 
scan, find each other, super-delete, and retry and not make progress. 
The lock fixes that by ensuring that there is only one inserter is doing 
the above at a time.


(With the above procedure, we could also first scan the index for 
conflicts, and only insert after that, because the 
SpeculativeInsertionLock prevents anyone else from inserting a 
conflicting row. But of course, we're not going to hold an exclusive 
table-level lock under normal circumstances anyway; the above was just a 
prelude to the real scheme below.)


The full procedure is this:

1. Insert tuple to heap and index
2. Scan the index to see if there is any other conflicting tuple. (If 
there isn't, or the conflicting tuple is already committed, we're done)

3. Super-delete the tuple we already inserted

4. Acquire 

[HACKERS] Re: Problems with approach #2 to value locking (INSERT ... ON CONFLICT UPDATE/IGNORE patch)

2015-01-05 Thread Peter Geoghegan
On Mon, Jan 5, 2015 at 5:53 PM, Peter Geoghegan p...@heroku.com wrote:
 It's not all that easy to recreate, even with my custom
 instrumentation. With fsync=off, it occurs somewhat infrequently with
 a custom instrumentation Postgres build:

 pg@hamster:~/jjanes_upsert$ perl count_upsert.pl 8 10
 [Mon Jan  5 17:31:57 2015] init done at count_upsert.pl line 101.
 [Mon Jan  5 17:32:17 2015] child abnormal exit [Mon Jan  5 17:32:17
 2015] DBD::Pg::db selectall_arrayref failed: ERROR:  mismatch in xmin
 for (322,264). Initially 4324145, then 4324429 at count_upsert.pl line
 210.\n  at count_upsert.pl line 227.
 [Mon Jan  5 17:32:49 2015] sum is -800
 [Mon Jan  5 17:32:49 2015] count is 9515
 [Mon Jan  5 17:32:49 2015] normal exit at 1420507969 after 733912
 items processed at count_upsert.pl line 182.

 I think that super deleting broken promise tuples undermines how
 vacuum interlocking is supposed to work [1].

Hmm. On second thought, I think that the instrumentation I added here
can be confused by redirecting line pointers, and that may account for
this apparent problem. Still, I would like for us to figure out how it
was previously possible for the implementation to update a
super-deleted tuple (resulting in unusual all-NULLs tuples) since I
though that in order for that to happen, other xacts would have to see
what was only ever a promise tuple as a fully-fledged tuple -- in
order for the logic that pre-checks for conflicts to find a conflict,
it would have to find the tuple already committed (so it certainly
couldn't be an unresolved promise tuple). My superficial fix [1] was
written without fully understanding what the problem was.

In any case, I doubt the changes that I made to tqual.c in that fix
will be accepted as-is.

[1] 
http://www.postgresql.org/message-id/cam3swztftt_fehet3tu3ykcpcypynnauquz3q+naasnh-60...@mail.gmail.com
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: Problems with approach #2 to value locking (INSERT ... ON CONFLICT UPDATE/IGNORE patch)

2015-01-05 Thread Peter Geoghegan
On Thu, Jan 1, 2015 at 11:17 PM, Peter Geoghegan p...@heroku.com wrote:
 The attached patch fixes Jeff's test case, as far as it goes. But, as
 I mentioned, it's not intended as a real fix. For one thing, I have
 made multiple changes to HeapTupleSatisfies*() routines, but haven't
 fully considered the consequences for, say, ri_triggers.c.
 HeapTupleSatisfiesSelf() and HeapTupleSatisfiesHistoricMVCC() were not
 modified. I've modified HeapTupleSatisfiesVacuum() to see
 InvalidTransactionID (not invalid xid infomask bits) as visible for
 garbage collection (alongside HeapTupleSatisfiesDirty() and
 HeapTupleSatisfiesMVCC(), which I changed first), and I wouldn't be
 surprised if that created new problems in other areas. In short, this
 patch is just for illustrative purposes.

I think that I see another race condition, requiring custom
instrumentation to the server to demonstrate.

Basically, even with the fix above, there are still similar races. I'm
not trying to call ExecLockUpdateTuple() against super deleted NULL
tuples, because those are no longer visible to any relevant snapshot
type, the fix described above. However, I can still see a change in
tuple header xmin between a dirty index scan and attempting to lock
that row to update. If I'm not mistaken, the race condition is small
enough that something like Jeff's tool won't catch it directly in a
reasonable amount of time, because it happens to be the same index key
value.

It's not all that easy to recreate, even with my custom
instrumentation. With fsync=off, it occurs somewhat infrequently with
a custom instrumentation Postgres build:

pg@hamster:~/jjanes_upsert$ perl count_upsert.pl 8 10
[Mon Jan  5 17:31:57 2015] init done at count_upsert.pl line 101.
[Mon Jan  5 17:32:17 2015] child abnormal exit [Mon Jan  5 17:32:17
2015] DBD::Pg::db selectall_arrayref failed: ERROR:  mismatch in xmin
for (322,264). Initially 4324145, then 4324429 at count_upsert.pl line
210.\n  at count_upsert.pl line 227.
[Mon Jan  5 17:32:49 2015] sum is -800
[Mon Jan  5 17:32:49 2015] count is 9515
[Mon Jan  5 17:32:49 2015] normal exit at 1420507969 after 733912
items processed at count_upsert.pl line 182.

I think that super deleting broken promise tuples undermines how
vacuum interlocking is supposed to work [1]. We end up at the wrong
tuple, that happens to occupy the same slot as the old one (and
happens to have the same index key value, because the race is so small
are there are relatively few distinct values in play - it's hard to
recreate). Attached instrumentation patch was used with Jeff Jane's
tool.

If we weren't super deleting in the patch, then I think that our
backend's xmin horizon would be enough of an interlock (haven't tested
that theory by testing value locking approach #1 implementation yet,
but I worried about the scenario quite a lot before and things seemed
okay).  But we are super deleting with implementation #2, and we're
also not holding a pin on the heap buffer or B-Tree leaf page buffer
throughout, so this happens (if I'm not mistaken).

[1] 
https://github.com/postgres/postgres/blob/REL9_3_STABLE/src/backend/access/nbtree/README#L142
-- 
Peter Geoghegan
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index f8a4017..4ce45e3 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -1189,7 +1189,7 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
  */
 bool
 ExecCheckIndexConstraints(TupleTableSlot *slot,
-		  EState *estate, ItemPointer conflictTid,
+		  EState *estate, HeapTuple conflictTid,
 		  Oid arbiterIdx)
 {
 	ResultRelInfo *resultRelInfo;
@@ -1339,7 +1339,7 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
 	 Datum *values, bool *isnull,
 	 EState *estate, bool newIndex,
 	 bool violationOK, bool wait,
-	 ItemPointer conflictTid)
+	 HeapTuple conflictTid)
 {
 	Oid		   *constr_procs;
 	uint16	   *constr_strats;
@@ -1469,7 +1469,10 @@ retry:
 		{
 			conflict = true;
 			if (conflictTid)
-*conflictTid = tup-t_self;
+			{
+conflictTid-t_self = tup-t_self;
+conflictTid-t_data = tup-t_data;
+			}
 			break;
 		}
 
@@ -1506,7 +1509,10 @@ retry:
 		{
 			conflict = true;
 			if (conflictTid)
-*conflictTid = tup-t_self;
+			{
+conflictTid-t_self = tup-t_self;
+conflictTid-t_data = tup-t_data;
+			}
 			break;
 		}
 
diff --git a/src/backend/executor/nodeModifyTable.c b/src/backend/executor/nodeModifyTable.c
index 69af2d1..a289ef4 100644
--- a/src/backend/executor/nodeModifyTable.c
+++ b/src/backend/executor/nodeModifyTable.c
@@ -56,7 +56,7 @@
 
 static bool ExecLockUpdateTuple(EState *estate,
 ResultRelInfo *relinfo,
-ItemPointer conflictTid,
+HeapTuple 	conflictTid,
 TupleTableSlot *planSlot,
 TupleTableSlot *insertSlot,
 bool			canSetTag,
@@ -292,7 +292,7 @@ ExecInsert(TupleTableSlot *slot,
 	else
 	{
 		boolconflict;
-		

[HACKERS] Re: Problems with approach #2 to value locking (INSERT ... ON CONFLICT UPDATE/IGNORE patch)

2015-01-04 Thread Peter Geoghegan
On Sat, Jan 3, 2015 at 10:16 PM, Peter Geoghegan p...@heroku.com wrote:
 I looked at the code in more detail, and realized that there were old
 bugs in the exclusion constraint related modifications. I attach a
 delta patch that fixes them. This is a combined patch that is all that
 is needed to apply on top of v1.8.vallock2.tar.gz [1] to have all
 available bugfixes.

I've updated Jeff Janes' test suite to support testing of exclusion
constraints that are equivalent to unique indexes:

https://github.com/petergeoghegan/jjanes_upsert/commit/a941f423e9500b847b1a9d1805ba52cb11db0ae9

(This requires a quick hack to the Postgres source code to accept
exclusion constraints as ON CONFLICT UPDATE arbiters).

So far, everything seems okay with exclusion constraints, as far as I
can determine using the stress tests that we have. This is an
encouraging sign.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: Problems with approach #2 to value locking (INSERT ... ON CONFLICT UPDATE/IGNORE patch)

2015-01-03 Thread Heikki Linnakangas

On 01/03/2015 02:43 AM, Peter Geoghegan wrote:

On Thu, Jan 1, 2015 at 11:17 PM, Peter Geoghegan p...@heroku.com wrote:

I've been working on fixing the bugs that Jeff Janes found [1] with
approach #2 to value locking [2]. Approach #1 was unaffected.


Unfortunately, exclusion constraints (which only value locking
approach #2 has support for, for the IGNORE variant) are broken, even
with my fix for the largely unrelated issues also described in my
opening mail. I now believe that it is inherently impossible to
support exclusion constraints with INSERT ... ON CONFLICT IGNORE, if
we are insistent that there cannot be livelocks, unprincipled
deadlocks or spurious exclusion violationsand I think you all know
how I feel about those  :-)  . I think that we should drop support for
exclusion constraints, regardless of whatever else happens with value
locking.

Previously, my mental model of approach #2 was that it more or less
handled B-Trees and exclusion constraint related indexes equivalently.
This is untrue, though.

Today, in general, exclusion constraints more or less work by
physically inserting a heap tuple and index tuple relating to the
exclusion constraint (typically this is with a GiST index). There
could well be a conflict due to a concurrent insertion, or even due to
the fact that there was a conflicting tuple all along. The
implementation tries to find a conclusively committed conflicting
tuple. If it does not find such a tuple, it knows that its own already
physically inserted tuple is enough of a reason for others to have to
worry about it, and so it may proceed and commit the xact -- once it
does a whole index scan with only its own tuple found (and not any
overlapping tuple from an unfinished xact), it's clear that it can
proceed.

Exclusion constraints are made to do a pre-check by approach #2 to
value locking, to do an IGNORE, but that isn't really significant
here. The reason that #2 is broken for exclusion constraints but not
for unique indexes is that unique index insertion reports a conflict
in the process of inserting. In contrast, unique constraints insert
optimistically, and re-check. What's the difference? Well, even though
(since the implementation passes UNIQUE_CHECK_PARTIAL for unique index
insertion) we still insert when there is a conflict, there is an
important distinction: Value locking. That is to say, even though we
still insert, we also still use buffer locks as value locks in the
usual way that unique indexes do (we hold an exclusion buffer lock on
the first B-Tree leaf page the value could be on, as always). We still
decide that there is no conflict when the is an exclusion buffer lock,
which is a sufficient choke point to make that determination
conclusive. This implies that even with many concurrent insertions
into a newly created table, there will definitely be one inserter that
conclusively does *not* have a conflict, while the others might
(depends on the status of the guy who conclusively had *no* conflict,
or his line of successor xacts that may or may not themselves commit).

In a private e-mail to Heikki, I pointed out that he was mistaken when
he said that there is a pre-existing problem with exclusion
constraints: it is not possible for concurrent inserters to both
spuriously get exclusion violations when only one needs to. Heikki
accepted this. However, I now realize that Heikki was closer to being
right about it than I was; you can't get spurious exclusion
violations, but you can get something that's about the same, which is
a deadlock. That usually doesn't matter that much, because people
usually don't worry about the issue with exclusion constraints. But in
a world where they support INSERT ... ON CONFLICT IGNORE, that isn't
acceptable, IMV.

I have a test case:
https://github.com/petergeoghegan/upsert/blob/master/exclusion.sh

I see both spurious exclusion violations, and unprincipled deadlocks
with this simple IGNORE-based testcase, that uses a GiST + btree_gist
exclusion constraint. I didn't see livelocks, which would have been
really bad, because a mutual need to wait on each other's xact causes
a deadlock, and the deadlock detector detects those. I think that the
spurious exclusion violations are merely an artifact of the
implementation, as opposed to an inherent problem with exclusion
constraints (in other words, I don't withdraw my remarks from the last
paragraph - only (arguably) unprincipled deadlocks occur with ordinary
exclusion constraint enforcement with lots of concurrency).


I read this email several times but could not understand. Please explain 
in words of one syllable how the deadlock arises. Then, please find a 
way to fix it.


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: Problems with approach #2 to value locking (INSERT ... ON CONFLICT UPDATE/IGNORE patch)

2015-01-03 Thread Heikki Linnakangas

On 01/03/2015 12:00 PM, Peter Geoghegan wrote:

Two concurrent exclusion constraints inserters can easily insert at
exactly the same time, and then wait on each other's xact, and then
deadlock. That can't happen with B-Tree inserts because the checking
and insertion happen at the same time, when that exclusive buffer lock
is held. Some inserter establishes the right to insert, and then
actually inserts atomically, and when it releases the buffer lock
every other inserter will see for itself that it has inserted (and
established the right to do so).


A-ha, I see. And this can happen without INSERT ON CONFLICT, too? In 
that case, one of the transactions is bound to error and roll back 
anyway, but you get a deadlock error instead of the constraint violation 
error, which is not as nice.



I'm sorry, but I honestly don't see a way to fix this one. It would
take a very novel approach, since exclusion constraints can work with
any amgettuple AM. I briefly though about doing something crazy with
the deadlock detector, but apart from anything else I think that might
introduce livelock risks.


Some ideas off the top of my head:

1. On conflict, mark the inserted tuple as killed, and retry. But before 
retrying, acquire a new kind of lock on the table, let's call it 
SpeculativeInsertionLock. This fixes the deadlock, by retrying instead 
of sleeping, and avoids the livelock because the new lock ensures that 
only one backend retries at a time.


2. Use e.g. the XID (or backend id or something) to resolve the tie. 
When you have inserted a tuple and find that it conflicts with another 
in-progress insertion, check the conflicting tuple's xmin. If it's  
current XID, wajt for the other inserter to finish. Otherwise kill the 
already-inserted tuple first, and wait only after that.


3. Don't allow the deadlock checker to kick in. Instead, use timeout 
with exponential backoff to avoid getting stuck in the livelock 
indefinitely.



Can there be other lock types involved in the deadlock? AFAICS it's 
always going to be between two or more backends that wait on each with 
XactLockTableWait (or some variant of that specific to speculative 
insertions).


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: Problems with approach #2 to value locking (INSERT ... ON CONFLICT UPDATE/IGNORE patch)

2015-01-03 Thread Peter Geoghegan
On Sat, Jan 3, 2015 at 1:29 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 Please explain in words of one syllable how the deadlock arises. Then,
 please find a way to fix it.

I believe that the deadlock arises because there is no choke point.
Exclusion constraint insertions always insert first, then check for a
conflict later. Whereas with B-Trees, we check *while* inserting, at
the critical point when an exclusive buffer lock is held on the first
leaf page a value could be on.

Two concurrent exclusion constraints inserters can easily insert at
exactly the same time, and then wait on each other's xact, and then
deadlock. That can't happen with B-Tree inserts because the checking
and insertion happen at the same time, when that exclusive buffer lock
is held. Some inserter establishes the right to insert, and then
actually inserts atomically, and when it releases the buffer lock
every other inserter will see for itself that it has inserted (and
established the right to do so).

I'm sorry, but I honestly don't see a way to fix this one. It would
take a very novel approach, since exclusion constraints can work with
any amgettuple AM. I briefly though about doing something crazy with
the deadlock detector, but apart from anything else I think that might
introduce livelock risks.
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: Problems with approach #2 to value locking (INSERT ... ON CONFLICT UPDATE/IGNORE patch)

2015-01-03 Thread Peter Geoghegan
On Sat, Jan 3, 2015 at 2:41 AM, Heikki Linnakangas
hlinnakan...@vmware.com wrote:
 A-ha, I see. And this can happen without INSERT ON CONFLICT, too? In that
 case, one of the transactions is bound to error and roll back anyway, but
 you get a deadlock error instead of the constraint violation error, which is
 not as nice.

Agreed.

I haven't experimentally verified that this can happen with exclusion
constraints without the ON CONFLICT patch being involved, but I would
be very surprised if it didn't. How could it possibly not happen? It's
not all that bad since, as you say, one or the other xact was going to
be aborted anyway. Since the insertions would have to occur at exactly
the same time, even if one backend were to get an exclusion violation
rather than being killed by the deadlock detector, the choice of which
backend to raise an error within would be essentially random anyway.

There are probably additional factors that make the situation more
complicated than it is for the ON CONFLICT patch, but it's clear to me
that the mutual dependency that can be involved with two ordinary
exclusion constraint inserters is reason enough for those to deadlock.

 I'm sorry, but I honestly don't see a way to fix this one. It would
 take a very novel approach, since exclusion constraints can work with
 any amgettuple AM. I briefly though about doing something crazy with
 the deadlock detector, but apart from anything else I think that might
 introduce livelock risks.

 Some ideas off the top of my head:

 1. On conflict, mark the inserted tuple as killed, and retry. But before
 retrying, acquire a new kind of lock on the table, let's call it
 SpeculativeInsertionLock. This fixes the deadlock, by retrying instead of
 sleeping, and avoids the livelock because the new lock ensures that only one
 backend retries at a time.

We super delete the tuple on retry already. However, we wait for the
other xact with our broken promise tuple still physically inserted
into the heap. We can't super delete the tuple before
XactLockTableWait()/SpeculativeInsertionWait() lock acquisition,
because doing so risks livelock. I think you already agree with me up
to here.

However, if we're not sleep waiting on the other xact (rather, we're
retrying [with a new class of exclusive table-level lock] instead of
sleeping), why should our xact be able to do useful work on retry?
Why won't it just spin/busy wait? More to the point, why wouldn't it
deadlock when it is obliged to wait on a third inserter's tuple?
AFAICT, you've just moved the problem around, because now we're
obliged to get a shared lock on the xid or speculative token
(XactLockTableWait()/SpeculativeInsertionWait()) of a session that
itself wants this new table level lock that only we have.

 2. Use e.g. the XID (or backend id or something) to resolve the tie. When
 you have inserted a tuple and find that it conflicts with another
 in-progress insertion, check the conflicting tuple's xmin. If it's  current
 XID, wajt for the other inserter to finish. Otherwise kill the
 already-inserted tuple first, and wait only after that.

That sounds scary. I think it might introduce livelock hazards. What
about some third xact, that's waiting on our XID, when we ourselves
super delete the already-inserted tuple in deference to the first
xact/inserter? I also worry about the bloating this could cause.

 3. Don't allow the deadlock checker to kick in. Instead, use timeout with
 exponential backoff to avoid getting stuck in the livelock indefinitely.

I don't know if that would work, but it seems very undesirable...to
add a new behavior to the deadlock detector just to make ON CONFLICT
IGNORE work with exclusion constraints. It seems to me like the best
suggestion, though.

 Can there be other lock types involved in the deadlock? AFAICS it's always
 going to be between two or more backends that wait on each with
 XactLockTableWait (or some variant of that specific to speculative
 insertions).

I believe you're right about that. But don't forget that at least with
the current implementation, we also get spurious exclusion violations.
I think that's because we super-delete, as we must for other reasons
(so they're not a concern for regular inserters today, as I said). So
solving the problem for regular inserters is probably easier than
solving it for the ON CONFLICT patch.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: Problems with approach #2 to value locking (INSERT ... ON CONFLICT UPDATE/IGNORE patch)

2015-01-03 Thread Peter Geoghegan
On Sat, Jan 3, 2015 at 12:42 PM, Peter Geoghegan p...@heroku.com wrote:
 There are probably additional factors that make the situation more
 complicated than it is for the ON CONFLICT patch, but it's clear to me
 that the mutual dependency that can be involved with two ordinary
 exclusion constraint inserters is reason enough for those to deadlock.

I looked at the code in more detail, and realized that there were old
bugs in the exclusion constraint related modifications. I attach a
delta patch that fixes them. This is a combined patch that is all that
is needed to apply on top of v1.8.vallock2.tar.gz [1] to have all
available bugfixes.

This unbreaks my previous exclusion constraint test case, as far as it
goes. I am still concerned about the risk of lock starvation/livelocks
here. But I must admit, having stressed the patch with this latest
fix, that it's possible that the real risks have been overestimated.
Maybe this is in the same category as the way LY's algorithm could
theoretically starve a session with a pagesplit that needs to move
right indefinitely. It's a theoretical risk that turns out to not
matter in practice, for reasons that lack a real theoretical
justification beyond the fact that sustained very bad luck - a
sustained perfect storm - is required for starvation to occur. OTOH,
maybe my OS scheduler doesn't have the characteristic that provoke a
the suspected bug. I really don't know, but I suppose that one or the
other concurrent inserters will tend to win the race often enough - a
draw may be a very rare thing.

[1] 
http://www.postgresql.org/message-id/cam3swzrg_htrol-6_wfe6_d_ucuyc28jfapsfh_tra76gkk...@mail.gmail.com
-- 
Peter Geoghegan
diff --git a/src/backend/executor/execUtils.c b/src/backend/executor/execUtils.c
index ea8c57b..f8a4017 100644
--- a/src/backend/executor/execUtils.c
+++ b/src/backend/executor/execUtils.c
@@ -1133,8 +1133,11 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
 		 * If the index has an associated exclusion constraint, check that.
 		 * This is simpler than the process for uniqueness checks since we
 		 * always insert first and then check.  If the constraint is deferred,
-		 * we check now anyway, but don't throw error on violation; instead
-		 * we'll queue a recheck event.
+		 * we check now anyway, but don't throw error on violation or wait for
+		 * a conclusive outcome from a concurrent insertion; instead we'll
+		 * queue a recheck event.  Similarly, noDupErr callers (speculative
+		 * inserters) will recheck later, and wait for a conclusive outcome
+		 * then.
 		 *
 		 * An index for an exclusion constraint can't also be UNIQUE (not an
 		 * essential property, we just don't allow it in the grammar), so no
@@ -1142,15 +1145,15 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
 		 */
 		if (indexInfo-ii_ExclusionOps != NULL)
 		{
-			bool		errorOK = (!indexRelation-rd_index-indimmediate 
-   !noDupErr);
+			bool		violationOK = (!indexRelation-rd_index-indimmediate ||
+	   noDupErr);
 
 			satisfiesConstraint =
 check_exclusion_or_unique_constraint(heapRelation,
 	 indexRelation, indexInfo,
 	 tupleid, values, isnull,
-	 estate, false, errorOK,
-	 false, NULL);
+	 estate, false,
+	 violationOK, false, NULL);
 		}
 
 		if ((checkUnique == UNIQUE_CHECK_PARTIAL ||
@@ -1158,7 +1161,7 @@ ExecInsertIndexTuples(TupleTableSlot *slot,
 			!satisfiesConstraint)
 		{
 			/*
-			 * The tuple potentially violates the uniqueness or exclusion
+			 * The tuple potentially violates the unique index or exclusion
 			 * constraint, so make a note of the index so that we can re-check
 			 * it later.
 			 */
@@ -1322,7 +1325,7 @@ ExecCheckIndexConstraints(TupleTableSlot *slot,
  * is convenient for deferred exclusion checks; we need not bother queuing
  * a deferred event if there is definitely no conflict at insertion time.
  *
- * When errorOK is false, we'll throw error on violation, so a false result
+ * When violationOK is false, we'll throw error on violation, so a false result
  * is impossible.
  *
  * Note: The indexam is normally responsible for checking unique constraints,
@@ -1335,7 +1338,7 @@ check_exclusion_or_unique_constraint(Relation heap, Relation index,
 	 IndexInfo *indexInfo, ItemPointer tupleid,
 	 Datum *values, bool *isnull,
 	 EState *estate, bool newIndex,
-	 bool errorOK, bool wait,
+	 bool violationOK, bool wait,
 	 ItemPointer conflictTid)
 {
 	Oid		   *constr_procs;
@@ -1462,7 +1465,7 @@ retry:
 		 * we're not supposed to raise error, just return the fact of the
 		 * potential conflict without waiting to see if it's real.
 		 */
-		if (errorOK  !wait)
+		if (violationOK  !wait)
 		{
 			conflict = true;
 			if (conflictTid)
@@ -1487,7 +1490,7 @@ retry:
 			if (DirtySnapshot.speculativeToken)
 SpeculativeInsertionWait(DirtySnapshot.xmin,
 		 DirtySnapshot.speculativeToken);
-			else if 

[HACKERS] Re: Problems with approach #2 to value locking (INSERT ... ON CONFLICT UPDATE/IGNORE patch)

2015-01-02 Thread Peter Geoghegan
On Thu, Jan 1, 2015 at 11:17 PM, Peter Geoghegan p...@heroku.com wrote:
 I've been working on fixing the bugs that Jeff Janes found [1] with
 approach #2 to value locking [2]. Approach #1 was unaffected.

Unfortunately, exclusion constraints (which only value locking
approach #2 has support for, for the IGNORE variant) are broken, even
with my fix for the largely unrelated issues also described in my
opening mail. I now believe that it is inherently impossible to
support exclusion constraints with INSERT ... ON CONFLICT IGNORE, if
we are insistent that there cannot be livelocks, unprincipled
deadlocks or spurious exclusion violationsand I think you all know
how I feel about those  :-)  . I think that we should drop support for
exclusion constraints, regardless of whatever else happens with value
locking.

Previously, my mental model of approach #2 was that it more or less
handled B-Trees and exclusion constraint related indexes equivalently.
This is untrue, though.

Today, in general, exclusion constraints more or less work by
physically inserting a heap tuple and index tuple relating to the
exclusion constraint (typically this is with a GiST index). There
could well be a conflict due to a concurrent insertion, or even due to
the fact that there was a conflicting tuple all along. The
implementation tries to find a conclusively committed conflicting
tuple. If it does not find such a tuple, it knows that its own already
physically inserted tuple is enough of a reason for others to have to
worry about it, and so it may proceed and commit the xact -- once it
does a whole index scan with only its own tuple found (and not any
overlapping tuple from an unfinished xact), it's clear that it can
proceed.

Exclusion constraints are made to do a pre-check by approach #2 to
value locking, to do an IGNORE, but that isn't really significant
here. The reason that #2 is broken for exclusion constraints but not
for unique indexes is that unique index insertion reports a conflict
in the process of inserting. In contrast, unique constraints insert
optimistically, and re-check. What's the difference? Well, even though
(since the implementation passes UNIQUE_CHECK_PARTIAL for unique index
insertion) we still insert when there is a conflict, there is an
important distinction: Value locking. That is to say, even though we
still insert, we also still use buffer locks as value locks in the
usual way that unique indexes do (we hold an exclusion buffer lock on
the first B-Tree leaf page the value could be on, as always). We still
decide that there is no conflict when the is an exclusion buffer lock,
which is a sufficient choke point to make that determination
conclusive. This implies that even with many concurrent insertions
into a newly created table, there will definitely be one inserter that
conclusively does *not* have a conflict, while the others might
(depends on the status of the guy who conclusively had *no* conflict,
or his line of successor xacts that may or may not themselves commit).

In a private e-mail to Heikki, I pointed out that he was mistaken when
he said that there is a pre-existing problem with exclusion
constraints: it is not possible for concurrent inserters to both
spuriously get exclusion violations when only one needs to. Heikki
accepted this. However, I now realize that Heikki was closer to being
right about it than I was; you can't get spurious exclusion
violations, but you can get something that's about the same, which is
a deadlock. That usually doesn't matter that much, because people
usually don't worry about the issue with exclusion constraints. But in
a world where they support INSERT ... ON CONFLICT IGNORE, that isn't
acceptable, IMV.

I have a test case:
https://github.com/petergeoghegan/upsert/blob/master/exclusion.sh

I see both spurious exclusion violations, and unprincipled deadlocks
with this simple IGNORE-based testcase, that uses a GiST + btree_gist
exclusion constraint. I didn't see livelocks, which would have been
really bad, because a mutual need to wait on each other's xact causes
a deadlock, and the deadlock detector detects those. I think that the
spurious exclusion violations are merely an artifact of the
implementation, as opposed to an inherent problem with exclusion
constraints (in other words, I don't withdraw my remarks from the last
paragraph - only (arguably) unprincipled deadlocks occur with ordinary
exclusion constraint enforcement with lots of concurrency).

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers