[HACKERS] Re: Problems with approach #2 to value locking (INSERT ... ON CONFLICT UPDATE/IGNORE patch)

Heikki Linnakangas Mon, 02 Feb 2015 09:38:50 -0800

On 01/03/2015 10:42 PM, Peter Geoghegan wrote:

On Sat, Jan 3, 2015 at 2:41 AM, Heikki Linnakangas
<[email protected]> wrote:

A-ha, I see. And this can happen without INSERT ON CONFLICT, too? In that
case, one of the transactions is bound to error and roll back anyway, but
you get a deadlock error instead of the constraint violation error, which is
not as nice.


Agreed.

I haven't experimentally verified that this can happen with exclusion
constraints without the ON CONFLICT patch being involved, but I would
be very surprised if it didn't. How could it possibly not happen? It's
not all that bad since, as you say, one or the other xact was going to
be aborted anyway. Since the insertions would have to occur at exactly
the same time, even if one backend were to get an exclusion violation
rather than being killed by the deadlock detector, the choice of which
backend to raise an error within would be essentially random anyway.

Yep. I just tested this, and I can confirm that it does happen withvanilla exclusion constraints. If two backends insert the same value atthe same time, you usually get the error "conflicting key value violatesexclusion constraint", but if the timing is right, you get "deadlockdetected" instead.

Let's focus on fixing that case first. I wouldn't care otherwise, but itallows us to work on the locking, the "super-deletion" etc. without allthe rest of the patch. That will provide you a way to split the patchinto two: 1. Avoid deadlock errors with regular exclusion constraints,with all the locking etc. that's needed to solve that, and 2. Upsert.

1. On conflict, mark the inserted tuple as killed, and retry. But before
retrying, acquire a new kind of lock on the table, let's call it
SpeculativeInsertionLock. This fixes the deadlock, by retrying instead of
sleeping, and avoids the livelock because the new lock ensures that only one
backend retries at a time.


We "super delete" the tuple on retry already. However, we wait for the
other xact with our broken promise tuple still physically inserted
into the heap. We can't super delete the tuple before
XactLockTableWait()/SpeculativeInsertionWait() lock acquisition,
because doing so risks livelock. I think you already agree with me up
to here.

However, if we're not sleep waiting on the other xact (rather, we're
"retrying [with a new class of exclusive table-level lock] instead of
sleeping"), why should our xact be able to do useful work on retry?
Why won't it just spin/busy wait? More to the point, why wouldn't it
deadlock when it is obliged to wait on a third inserter's tuple?
AFAICT, you've just moved the problem around, because now we're
obliged to get a shared lock on the xid or speculative token
(XactLockTableWait()/SpeculativeInsertionWait()) of a session that
itself wants this new table level lock that only we have.

Sorry, I was very terse in my previous explanation. Let me try again.Let's begin with a simpler, poor-performance version of the scheme:


1. Acquire the new SpeculativeInsertionLock on the table
2. Insert tuple to heap and index.

3. Scan the index to see if there is any other conflicting tuple. (Ifthere isn't, or the conflicting tuple is already committed, we're done)

4. Super-delete the tuple we already inserted
5. Release SpeculativeInsertionLock
6. XactLockTableWait() on the other guy

Note that we don't try to acquire any other locks while holdingSpeculativeInsertionLock.

Compare this with the way unique-checks in b-tree work today. The B-treecheck is free of race conditions because we hold the lock on the b-treepage while we check the visibility of the page, and we don't insert theindex tuple if we have to wait. The SpeculativeInsertionLockaccomplishes the same. It makes steps 3 and 4 atomic.

Compared to your patch as it stands, this fixes the deadlock becausewhen A inserts a tuple and scans the index, any conflicting tuples itfinds must have completed the insertion before A. The other inserterswon't later try to wait on the tuple that A inserts.

We could do the above without the new lock, but as you've said, thatwould risk a livelock. Two concurrent inserters might repeatedly insert,scan, find each other, super-delete, and retry and not make progress.The lock fixes that by ensuring that there is only one inserter is doingthe above at a time.

(With the above procedure, we could also first scan the index forconflicts, and only insert after that, because theSpeculativeInsertionLock prevents anyone else from inserting aconflicting row. But of course, we're not going to hold an exclusivetable-level lock under normal circumstances anyway; the above was just aprelude to the real scheme below.)


The full procedure is this:

1. Insert tuple to heap and index

2. Scan the index to see if there is any other conflicting tuple. (Ifthere isn't, or the conflicting tuple is already committed, we're done)

3. Super-delete the tuple we already inserted

4. Acquire SpeculativeInsertionLock on the table
5. Insert tuple to heap and index

6. Scan the index to see if there is any other conflicting tuple. (Ifthere isn't, or the conflicting tuple is already committed, we're done)

7. Super-delete the tuple we already inserted
8. Release SpeculativeInsertionLock
9. XactLockTableWait() on the other guy

So in a nutshell, first try to do the insertion without the table-levellock. But because that would be prone to livelocks, if it doesn'timmediately work, retry with the table-level lock.

2. Use e.g. the XID (or backend id or something) to resolve the tie. When
you have inserted a tuple and find that it conflicts with another
in-progress insertion, check the conflicting tuple's xmin. If it's < current
XID, wajt for the other inserter to finish. Otherwise kill the
already-inserted tuple first, and wait only after that.


That sounds scary. I think it might introduce livelock hazards. What
about some third xact, that's waiting on our XID, when we ourselves
super delete the already-inserted tuple in deference to the first
xact/inserter?

He will get woken up when we super-delete the already-inserted tuple.What problem do you see there?

I actually like this scheme the best. It's simple. I haven't made anyrigorous analysis to prove that it's correct, but I don't immediatelysee a problem with it.

I also worry about the bloating this could cause.

Well, this is all under the assumption that conflicts andsuper-deletions are rare. Of course, we must avoid the situation wherewe have to attempt an insertion dozens of times until it succeeds,leaving behind 20x bloat compared to the real data. I don't think thatwould happen with these schemes, although it will need some testing I guess.

Can there be other lock types involved in the deadlock? AFAICS it's always
going to be between two or more backends that wait on each with
XactLockTableWait (or some variant of that specific to speculative
insertions).


I believe you're right about that. But don't forget that at least with
the current implementation, we also get spurious exclusion violations.

Ok, I clearly haven't been able to keep up with all the issues here. Howdo spurious exclusion violations arise?


- Heikki



--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: Problems with approach #2 to value locking (INSERT ... ON CONFLICT UPDATE/IGNORE patch)

Reply via email to