On 3 November 2017 at 16:35, Peter Geoghegan <p...@bowt.ie> wrote: > Simon Riggs <si...@2ndquadrant.com> wrote: >>>> >>>> The *only* behavioural difference I have proposed would be the *lack* >>>> of an ERROR in (some) concurrent cases. >>> >>> >>> I think that's a big difference. Error vs. non-error is a big deal by >>> itself; >> >> >> Are you saying avoiding an ERROR is a bad or good thing? > > > Are you really asking Robert to repeat what has already been said about > a dozen different ways?
I'm asking for clarity of explanation rather than assertions. > That's *not* the only difference. You need to see a couple of steps > ahead to see further differences, as the real dilemma comes when you > have to reconcile having provided the UPSERT-guarantees with cases that > that doesn't map on to (which can happen in a number of different ways). > > I don't understand why you'll talk about just about anything but that. > This is a high-level concern about the overarching design. Do you really > not understand the concern at this point? You're either referring to what is in the docs, which is INSERT ... ON CONFLICT violates MVCC in a particular way, or something as yet unstated. If it is the former, then I still don't see the problem (see later). If it is the latter, I need more. So either way I need more. > Robert Haas said >In the past, there have been objections to implementations of MERGE >which would give rise to such serialization anomalies, but I'm not >sure we should feel bound by those discussions. One thing that's >different is that the common and actually-useful case can now be made >to work in a fairly satisfying way using INSERT .. ON CONFLICT UPDATE; >if less useful cases are vulnerable to some weirdness, maybe it's OK >to just document the problems. I agreed with that, and still do. We need a clear, explicit description of this situation so I will attempt that in detail here. The basic concurrency problem we have is this APPROACH1 1. Join to produce results based upon snapshot at start of query 2. Apply results for INSERT, UPDATE or DELETE Given there is a time delay between 1 and 2 there is a race condition so that if another user concurrently inserts the same value into a unique index then an INSERT will fail with a uniqueness violation. Such failures are of great concern in practice because the time between 1 and 2 could be very long for large statements, or for smaller statements we might have sufficiently high concurrency to allow us to see regular failures. APPROACH2 (modified from my original proposal slightly) 1. Join... 2. Apply results for UPDATE, if present not visible via the snapshot taken at 1, do EPQ to ensure we locate current live tuple 3. If still not visible, do speculative insertion if we have a unique index available, otherwise ERROR. If spec insertion fails, go to 2 The loop created above can live-lock, meaning that an infinite loop could be created. In practice, such live-locks are rare and we could detect them by falling out of the loop after a few tries. Approach2's purpose is to alleviate errors in Approach1, so falling out of the loop merely takes us back to the error we would have got if we didn't try, so Approach2 has considerable benefit over Approach1. This only applies if we do an INSERT, so if there is a WHEN NOT MATCHED ... AND clause with probability W, that makes the INSERT rare then we simply have the probablility of error in Approach2 approach the probability of error in Approach1 as the W drops to zero, but with W high we may avoid many errors. Approach2 never generates more errors than Approach1. I read that step 3 in Approach2 is some kind of problem in MVCC semantics. My understanding is that SQL Standard allows us to define what the semantics of the statement are in relation to concurrency, so any semantic issue can be handled by defining it to work the way we want. The semantics are: a) when a unique index is available we avoid errors by using semantics of INSERT .. ON CONFLICT UPDATE. b) when a unique index is not available we use other semantics. To me this is the same as INSERTs failing in the presence of unique indexes, but not failing when no index is present. The presence of a unique constraint alters the semantics of the query. We can choose Approach2 - as Robert says "[we should not] feel bound by those [earlier] discussions" Please explain what is wrong with the above without merely asserting there is a problem. As you point out, whichever we choose, we will be bound by those semantics. So if we take Approach1, as has been indicated currently, what is the written explanation for that, so we can show that to the people who ask in the future about our decisions? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers