On 01/03/2015 10:42 PM, Peter Geoghegan wrote:
On Sat, Jan 3, 2015 at 2:41 AM, Heikki Linnakangas
<hlinnakan...@vmware.com> wrote:
A-ha, I see. And this can happen without INSERT ON CONFLICT, too? In that
case, one of the transactions is bound to error and roll back anyway, but
you get a deadlock error instead of the constraint violation error, which is
not as nice.


I haven't experimentally verified that this can happen with exclusion
constraints without the ON CONFLICT patch being involved, but I would
be very surprised if it didn't. How could it possibly not happen? It's
not all that bad since, as you say, one or the other xact was going to
be aborted anyway. Since the insertions would have to occur at exactly
the same time, even if one backend were to get an exclusion violation
rather than being killed by the deadlock detector, the choice of which
backend to raise an error within would be essentially random anyway.

Yep. I just tested this, and I can confirm that it does happen with vanilla exclusion constraints. If two backends insert the same value at the same time, you usually get the error "conflicting key value violates exclusion constraint", but if the timing is right, you get "deadlock detected" instead.

Let's focus on fixing that case first. I wouldn't care otherwise, but it allows us to work on the locking, the "super-deletion" etc. without all the rest of the patch. That will provide you a way to split the patch into two: 1. Avoid deadlock errors with regular exclusion constraints, with all the locking etc. that's needed to solve that, and 2. Upsert.

1. On conflict, mark the inserted tuple as killed, and retry. But before
retrying, acquire a new kind of lock on the table, let's call it
SpeculativeInsertionLock. This fixes the deadlock, by retrying instead of
sleeping, and avoids the livelock because the new lock ensures that only one
backend retries at a time.

We "super delete" the tuple on retry already. However, we wait for the
other xact with our broken promise tuple still physically inserted
into the heap. We can't super delete the tuple before
XactLockTableWait()/SpeculativeInsertionWait() lock acquisition,
because doing so risks livelock. I think you already agree with me up
to here.

However, if we're not sleep waiting on the other xact (rather, we're
"retrying [with a new class of exclusive table-level lock] instead of
sleeping"), why should our xact be able to do useful work on retry?
Why won't it just spin/busy wait? More to the point, why wouldn't it
deadlock when it is obliged to wait on a third inserter's tuple?
AFAICT, you've just moved the problem around, because now we're
obliged to get a shared lock on the xid or speculative token
(XactLockTableWait()/SpeculativeInsertionWait()) of a session that
itself wants this new table level lock that only we have.

Sorry, I was very terse in my previous explanation. Let me try again. Let's begin with a simpler, poor-performance version of the scheme:

1. Acquire the new SpeculativeInsertionLock on the table
2. Insert tuple to heap and index.
3. Scan the index to see if there is any other conflicting tuple. (If there isn't, or the conflicting tuple is already committed, we're done)
4. Super-delete the tuple we already inserted
5. Release SpeculativeInsertionLock
6. XactLockTableWait() on the other guy

Note that we don't try to acquire any other locks while holding SpeculativeInsertionLock.

Compare this with the way unique-checks in b-tree work today. The B-tree check is free of race conditions because we hold the lock on the b-tree page while we check the visibility of the page, and we don't insert the index tuple if we have to wait. The SpeculativeInsertionLock accomplishes the same. It makes steps 3 and 4 atomic.

Compared to your patch as it stands, this fixes the deadlock because when A inserts a tuple and scans the index, any conflicting tuples it finds must have completed the insertion before A. The other inserters won't later try to wait on the tuple that A inserts.

We could do the above without the new lock, but as you've said, that would risk a livelock. Two concurrent inserters might repeatedly insert, scan, find each other, super-delete, and retry and not make progress. The lock fixes that by ensuring that there is only one inserter is doing the above at a time.

(With the above procedure, we could also first scan the index for conflicts, and only insert after that, because the SpeculativeInsertionLock prevents anyone else from inserting a conflicting row. But of course, we're not going to hold an exclusive table-level lock under normal circumstances anyway; the above was just a prelude to the real scheme below.)

The full procedure is this:

1. Insert tuple to heap and index
2. Scan the index to see if there is any other conflicting tuple. (If there isn't, or the conflicting tuple is already committed, we're done)
3. Super-delete the tuple we already inserted

4. Acquire SpeculativeInsertionLock on the table
5. Insert tuple to heap and index
6. Scan the index to see if there is any other conflicting tuple. (If there isn't, or the conflicting tuple is already committed, we're done)
7. Super-delete the tuple we already inserted
8. Release SpeculativeInsertionLock
9. XactLockTableWait() on the other guy

So in a nutshell, first try to do the insertion without the table-level lock. But because that would be prone to livelocks, if it doesn't immediately work, retry with the table-level lock.

2. Use e.g. the XID (or backend id or something) to resolve the tie. When
you have inserted a tuple and find that it conflicts with another
in-progress insertion, check the conflicting tuple's xmin. If it's < current
XID, wajt for the other inserter to finish. Otherwise kill the
already-inserted tuple first, and wait only after that.

That sounds scary. I think it might introduce livelock hazards. What
about some third xact, that's waiting on our XID, when we ourselves
super delete the already-inserted tuple in deference to the first

He will get woken up when we super-delete the already-inserted tuple. What problem do you see there?

I actually like this scheme the best. It's simple. I haven't made any rigorous analysis to prove that it's correct, but I don't immediately see a problem with it.

I also worry about the bloating this could cause.

Well, this is all under the assumption that conflicts and super-deletions are rare. Of course, we must avoid the situation where we have to attempt an insertion dozens of times until it succeeds, leaving behind 20x bloat compared to the real data. I don't think that would happen with these schemes, although it will need some testing I guess.

Can there be other lock types involved in the deadlock? AFAICS it's always
going to be between two or more backends that wait on each with
XactLockTableWait (or some variant of that specific to speculative

I believe you're right about that. But don't forget that at least with
the current implementation, we also get spurious exclusion violations.

Ok, I clearly haven't been able to keep up with all the issues here. How do spurious exclusion violations arise?

- Heikki

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to