On Wed, Oct 1, 2014 at 4:34 AM, Simon Riggs <si...@2ndquadrant.com> wrote:
> Summary of algorithm to use "promise tuples" for concurrency control
> during UPSERT

> Index bloat is less of a problem than with normal inserts since there
> are additional ways of removing promise tuples. Only one index tuple
> at a time can have a specific value, so we would actually reduce heap
> bloat caused by concurrent inserts.

I am not all that concerned about bloat. I didn't think it was a major
advantage of #1 relative to #2 when I investigated the differences,
and looked at the numbers. And I'm speaking as the advocate of the
design with zero bloat.

> It's possible we find existing rows that we can't see according to our
> snapshot. That is handled in exactly the same as the way we treat

This isn't a special case in the patch. It's more like REPEATABLE READ
and SERIALIZABLE have a special responsibility to make sure they're
not updating an invisible-to-MVCC-snapshot (not "instantaneously
invisible", by which I mean invisible in the
HeapTupleSatisfiesUpdate() sense). Otherwise, we don't care if it's
MVCC-visible or not.

I imagine that by "That is handled in exactly the same as the way we
treat UPDATEs", you mean that we do the EvalPlanQual() stuff. We only
do that in the event of a concurrent UPDATE/DELETE, though. Otherwise,
we trust the underlying relation scan to accurate represent that
tuples pulled up are visible. So I don't understand the comparison
(but tell me if I've misunderstood).

> There is a potential loop here in that we might find an index tuple,
> follow it, find out the tuple is actually dead then return to insert a
> promise tuple, only to find that someone else just did that and we
> have to wait/re-follow the link to update the new tuple. That is an
> extremely unlikely race condition, though possible; there is no heavy
> locking to prevent that in this approach. In principle deadlocks are
> possible from that, but that is not any different from the current
> case where people that insert same key at same time might cause
> deadlocks. Its not a common application pattern and not something we
> should be protecting against.

People are going to use this feature in cases where they could almost
use an UPDATE. It will happen if you let it happen. But even if you
don't, a guarantee is infinitely more useful than a non-guarantee to
app developers. I'll be up-front about this: you have very little
chance of getting me to budge on this point. I *really* hate the idea
of allowing this kind of thing. I don't think I'm alone here.

> All of this is only needed for unique indexes.
> It can handled by a new API call for acquire_value_lock() and
> release_value_lock(), or similar.
> Advantages
> * We don't do anything weird in the heap - if this breaks, data is not corrupt

Index corruption leads to wrong answers. I don't think this is a very
good argument, fwiw.

> * There is no heap bloat or index bloat above existing levels

Fair enough.

> * The approach leverages existing mechanism for transaction waiting

That's not really an advantage. That more or less applies to all designs.

> * Optimistic approach to value locking will improve performance over
> heavy weight locking

There is no evidence that promise index tuples will perform better. #2
didn't perform better than #1.

> Disadvantages
> * Not written yet - <1 month to code, still possible for 9.5, doesn't look 
> hard

Maybe, but that is beside the point, which is: If there were any
fundamental problems with either #1 or #2, I think I'd have figured
them out by now. They are less risky today - it might be that #3 turns
out to have the same properties, but I cannot be sure. That counts for
something. I feel perfectly entitled to hold that kind of uncertainty
against any design, tbh.

> Other stated possible disadvantages
> * Violates existing invariant that every index tuple has a
> corresponding heap tuple, which goes back to the Berkeley days.
> Currently, we always create heap tuples first, and physically delete
> them last. [Why is that a problem? Could be, but why?]

Unknown unknowns. Huge amounts of code must be audited to figure out
that it isn't an issue. So I don't know, but then I'm not the one
making the proposal.

> ("Deleting them last" implies something is being touched in that
> regard by this suggestion. I'm not sure what you are referring to)

The uncertain scope of the problem is a big part of the problem.

> * Relatedly, concern about breaking VACUUM. VACUUMing of btrees occurs
> with a list of TIDs to kill in index, built from the heap. Current
> bulk delete callback cannot be used for this. Super-exclusive lock is
> needed to delete tuples in btree page (i.e. VACUUM). Normally skipping
> of LockBufferForCleanup() (as added by bbb6e559) works okay in heap
> because it tends to imply that list of tids won't be bulk deleted in
> index, where we currently cannot skip for failure to quickly acquire
> super exclusive lock. So this could make the problem addressed by
> bbb6e559 return to some extent.
> [Don't see any problems; just test the xid as we scan a promise tuple
> and remove it if needed]
> "Index-only" bloat becomes a possibility. Implications are not well 
> understood.
> [FUD, no reason to believe there is a problem.]

"no reason to believe there is a problem" isn't that good a defense. I
know this from experience.

> Doesn't have a strategy for dealing with unprincipled deadlocks, at
> least at the moment. Presumably some aspects of #2 could be adopted
> here.
> [FUD. Unprincipled deadlock still not properly explained as yet. No
> way of telling whether it will happen with this approach, or not].

What's unclear about unprincipled deadlocks? Heikki went to
considerable effort to have his design meet my standard of not doing
that. I think that there is almost no controversy about whether or not
we need to avoid that at this stage.

If you want a more practical definition, I think it's very unlikely
that I'll accuse any implementation of exhibiting this problem once it
doesn't deadlock with the test-case I presented Heikki with last year

Peter Geoghegan

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to