On Mon, Mar 5, 2018 at 3:02 AM, Pavan Deolasee <pavan.deola...@gmail.com> wrote: >> Again, I have to ask: is such an UPDATE actually meaningfully >> different from a concurrent DELETE + INSERT? If so, why is a special >> error better than a dup violation, or maybe even having the INSERT >> (and whole MERGE statement) succeed? >> > > Ok, I agree. I have updated the patch to remove the serialization error.
Cool. This is really shaping up. Thank you for working so hard to address my concerns. > If MATCHED changes to NOT MATCHED because of concurrent update/delete, we now > simply retry from the top and execute the first NOT MATCHED action, if WHEN > AND qual passes on the updated version. Of course, if the MERGE does not > contain any NOT MATCHED action then we simply ignore the target row and move > to the next row. Since a NOT MATCHED case can never turn into a MATCHED > case, there is no risk of a live lock. Makes sense. We either follow an UPDATE chain, which must always make forward progress, or do NOT MATCHED handling/insert a row, which only happens when NOT MATCHED handling has been abandoned, and so similarly must make forward progress. I think that this needs to be documented in a central place, though. I'd like to see arguments for why the design is livelock safe, as well as an explanation for how EPQ handling works, what the goals of how it works are, and so on. I would perhaps summarize it by saying something like the following within ExecMerge(): """ ExecMerge() EPQ handling repeatedly rechecks all WHEN MATCHED actions for each new candidate tuple as an update chain is followed, either until a tuple is actually updated/deleted, or until we decide that the new join input row actually requires WHEN NOT MATCHED handling after all. Rechecking join quals for a single candidate tuple is performed by ExecUpdate() and ExecDelete(), which can specially instruct us to advance to the next tuple in the update chain so that we can recheck our own WHEN ... AND quals (when the join quals no longer pass due to a concurrent UPDATE), or to switch to the WHEN NOT MATCHED case (when join quals no longer pass due to a concurrent DELETE). EPQ only ever installs a new tuple for the target relation (never the source), so changing from one WHEN MATCHED action to another during READ COMMITTED conflict handling must be due to a concurrent UPDATE that changes WHEN ... AND qual referenced attribute(s). If it was due to a concurrent DELETE, or, equivalently, some concurrent UPDATE that affects the target's primary key attribute(s) (or whatever the outer join's target qual attributes are), then we switch to WHEN NOT MATCHED handling, which will never switch back to WHEN MATCHED. ExecMerge() avoids livelocks by always either walking the UPDATE chain, which makes forward progress, or by finally switching to WHEN NOT MATCHED processing. """ ExecUpdate()/ExecDelete() are not "driving" EPQ here, which is new -- they're doing one small part of it (the join quals for one tuple) on behalf of ExecMerge(). I don't expect you to just take my wording without editing or moving parts around a bit; I think that you'll get the general idea from what I've written, though. Actually, my wording may need to be changed to reflect a new code structure for EPQ. The current control flow makes the reader think about UPDATE quals, as well as DELETE quals. Instead, the reader should think about WHEN ... AND quals, as well as MERGE's outer JOIN quals (the join quals are the same, regardless of whether an UPDATE or DELETE happens to be involved). The control flow between ExecMerge(), ExecUpdate(), and ExecDelete() is just confusing. Sure, ExecUpdate() and ExecDelete() *do* require a little special handling for MERGE, but ExecMerge() should itself call EvalPlanQual() for the join quals, rather than getting ExecUpdate()/ExecDelete() to do it. All that ExecUpdate()/ExecDelete() need to tell ExecMerge() is "you need to do your HeapTupleUpdated stuff". This not only clarifies the new EPQ design -- it also lets you do that special case rti EPQ stuff in the right place (BTW, I'm not a huge fan of that detail in general, but haven't thought about it enough to say more). The new EPQ code within ExecUpdate() and ExecDelete() is already identical, so why not do it that way? I especially like this organization because ExecOnConflictUpdate() already calls ExecUpdate() without expecting it to do EPQ handling at all, which is kind of similar to what I suggest -- in both cases, the caller "takes care of all of the details of the scan". And, by returning a HTSU_Result value to ExecMerge() from ExecUpdate()/ExecDelete(), you could also do the new cardinality violation stuff from within ExecMerge() in exactly one place -- no need to invent new error_on_SelfUpdate arguments. I would also think about organizing ExecMerge() handling so that CMD_INSERT handling became WHEN NOT MATCHED handling. It's unnecessarily confusing to have that included in the same switch statement as CMD_UPDATE and CMD_DELETE, since that organization fails to convey that once we reach WHEN NOT MATCHED, there is no going back. Actually, I suppose the new EPQ organization described in the last few paragraphs already implies that you'd do this CMD_INSERT stuff, since the high level goal is breaking ExecMerge() into WHEN MATCHED and WHEN NOT MATCHED sections (doing "commonality and variability analysis" for CMD_UPDATE and CMD_DELETE only serves that high level goal). Other feedback: * Is a new ExecMaterializeSlot() call needed for the EPQ slot? Again, doing your own EvalPlanQual() within ExecMerge() would perhaps have avoided this. * Do we really need UpdateStmt raw parser node pointers in structs that appear in execnodes.h? What's that all about? * Tests for transition table behavior (mixed INSERTs, UPDATEs, and DELETEs) within triggers.sql seems like a good idea. * Is this comment really accurate?: > + /* > + * If EvalPlanQual did not return a tuple, it means we > + * have seen a concurrent delete, or a concurrent update > + * where the row has moved to another partition. > + * > + * Set matched = false and loop back if there exists a NOT > + * MATCHED action. Otherwise, we have nothing to do for this > + * tuple and we can continue to the next tuple. > + */ Won't we loop back when a concurrent update occurs that makes the new candidate tuple not satisfy the join qual anymore? What does this have to do with partitioning? * Why does MergeJoin get referenced here?: > + * Also, since the internal MergeJoin node can hide the source and target > + * relations, we must explicitly make the respective relation as visible so > + * that columns can be referenced unqualified from these relations. That's a struct that has nothing to do with SQL MERGE in particular (no more than hash join does, for example). * This patch could use a pg_indent. * We already heard about this code from Tomas, who found it unnecessary: > + /* > + * SQL Standard says that WHEN AND conditions must not > + * write to the database, so check we haven't written > + * any WAL during the test. Very sensible that is, since > + * we can end up evaluating some tests multiple times if > + * we have concurrent activity and complex WHEN clauses. > + * > + * XXX If we had some clear form of functional labelling > + * we could use that, if we trusted it. > + */ > + if (startWAL < GetXactWALBytes()) > + ereport(ERROR, > + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), > + errmsg("cannot write to database within WHEN AND > condition"))); This needs to go. Apart from the fact that GetXactWALBytes() is buggy (it returns int64 for the unsigned type XLogRecPtr), the whole idea just seems unnecessary. I don't see why this is any different to using a volatile function in a regular UPDATE. * I still think we probably need to add something to ruleutils.c, so that MERGE Query structs can be deparsed -- see get_query_def(). Yeah, we've decided we're not going to support user-visible rules, but I continue to think that deparsing support is necessary on general principle, and for the benefit of extensions like Citus that use deparsing in a fairly broad way. At the very least, if we're not going to support deparsing, there needs to be a better reason than "we're not supporting user-visible rules". -- Peter Geoghegan