Re: [HACKERS] MERGE SQL Statement for PG11

Peter Geoghegan Mon, 05 Mar 2018 20:26:51 -0800

On Mon, Mar 5, 2018 at 3:02 AM, Pavan Deolasee <pavan.deola...@gmail.com> wrote:
>> Again, I have to ask: is such an UPDATE actually meaningfully
>> different from a concurrent DELETE + INSERT? If so, why is a special
>> error better than a dup violation, or maybe even having the INSERT
>> (and whole MERGE statement) succeed?
>>
>
> Ok, I agree. I have updated the patch to remove the serialization error.


Cool. This is really shaping up. Thank you for working so hard to
address my concerns.

> If MATCHED changes to NOT MATCHED because of concurrent update/delete, we now
> simply retry from the top and execute the first NOT MATCHED action, if WHEN
> AND qual passes on the updated version. Of course, if the MERGE does not
> contain any NOT MATCHED action then we simply ignore the target row and move
> to the next row. Since a NOT MATCHED case can never turn into a MATCHED
> case, there is no risk of a live lock.

Makes sense. We either follow an UPDATE chain, which must always make
forward progress, or do NOT MATCHED handling/insert a row, which only
happens when NOT MATCHED handling has been abandoned, and so similarly
must make forward progress. I think that this needs to be documented
in a central place, though. I'd like to see arguments for why the
design is livelock safe, as well as an explanation for how EPQ
handling works, what the goals of how it works are, and so on. I would
perhaps summarize it by saying something like the following within
ExecMerge():

"""
ExecMerge() EPQ handling repeatedly rechecks all WHEN MATCHED actions
for each new candidate tuple as an update chain is followed, either
until a tuple is actually updated/deleted, or until we decide that the
new join input row actually requires WHEN NOT MATCHED handling after
all. Rechecking join quals for a single candidate tuple is performed
by ExecUpdate() and ExecDelete(), which can specially instruct us to
advance to the next tuple in the update chain so that we can recheck
our own WHEN ... AND quals (when the join quals no longer pass due to
a concurrent UPDATE), or to switch to the WHEN NOT MATCHED case (when
join quals no longer pass due to a concurrent DELETE).

EPQ only ever installs a new tuple for the target relation (never the
source), so changing from one WHEN MATCHED action to another during
READ COMMITTED conflict handling must be due to a concurrent UPDATE
that changes WHEN ... AND qual referenced attribute(s). If it was due
to a concurrent DELETE, or, equivalently, some concurrent UPDATE that
affects the target's primary key attribute(s) (or whatever the outer
join's target qual attributes are), then we switch to WHEN NOT MATCHED
handling, which will never switch back to WHEN MATCHED. ExecMerge()
avoids livelocks by always either walking the UPDATE chain, which
makes forward progress, or by finally switching to WHEN NOT MATCHED
processing.

"""

ExecUpdate()/ExecDelete() are not "driving" EPQ here, which is new --
they're doing one small part of it (the join quals for one tuple) on
behalf of ExecMerge(). I don't expect you to just take my wording
without editing or moving parts around a bit; I think that you'll get
the general idea from what I've written, though.

Actually, my wording may need to be changed to reflect a new code
structure for EPQ. The current control flow makes the reader think
about UPDATE quals, as well as DELETE quals. Instead, the reader
should think about WHEN ... AND quals, as well as MERGE's outer JOIN
quals (the join quals are the same, regardless of whether an UPDATE or
DELETE happens to be involved). The control flow between ExecMerge(),
ExecUpdate(), and ExecDelete() is just confusing.

Sure, ExecUpdate() and ExecDelete() *do* require a little special
handling for MERGE, but ExecMerge() should itself call EvalPlanQual()
for the join quals, rather than getting ExecUpdate()/ExecDelete() to
do it. All that ExecUpdate()/ExecDelete() need to tell ExecMerge() is
"you need to do your HeapTupleUpdated stuff". This not only clarifies
the new EPQ design -- it also lets you do that special case rti EPQ
stuff in the right place (BTW, I'm not a huge fan of that detail in
general, but haven't thought about it enough to say more). The new EPQ
code within ExecUpdate() and ExecDelete() is already identical, so why
not do it that way?

I especially like this organization because ExecOnConflictUpdate()
already calls ExecUpdate() without expecting it to do EPQ handling at
all, which is kind of similar to what I suggest -- in both cases, the
caller "takes care of all of the details of the scan". And, by
returning a HTSU_Result value to ExecMerge() from
ExecUpdate()/ExecDelete(), you could also do the new cardinality
violation stuff from within ExecMerge() in exactly one place -- no
need to invent new error_on_SelfUpdate arguments.

I would also think about organizing ExecMerge() handling so that
CMD_INSERT handling became WHEN NOT MATCHED handling. It's
unnecessarily confusing to have that included in the same switch
statement as CMD_UPDATE and CMD_DELETE, since that organization fails
to convey that once we reach WHEN NOT MATCHED, there is no going back.
Actually, I suppose the new EPQ organization described in the last few
paragraphs already implies that you'd do this CMD_INSERT stuff, since
the high level goal is breaking ExecMerge() into WHEN MATCHED and WHEN
NOT MATCHED sections (doing "commonality and variability analysis" for
CMD_UPDATE and CMD_DELETE only serves that high level goal).

Other feedback:

* Is a new ExecMaterializeSlot() call needed for the EPQ slot? Again,
doing your own EvalPlanQual() within ExecMerge() would perhaps have
avoided this.

* Do we really need UpdateStmt raw parser node pointers in structs
that appear in execnodes.h? What's that all about?

* Tests for transition table behavior (mixed INSERTs, UPDATEs, and
DELETEs) within triggers.sql seems like a good idea.

* Is this comment really accurate?:

> +               /*
> +                * If EvalPlanQual did not return a tuple, it means we
> +                * have seen a concurrent delete, or a concurrent update
> +                * where the row has moved to another partition.
> +                *
> +                * Set matched = false and loop back if there exists a NOT
> +                * MATCHED action. Otherwise, we have nothing to do for this
> +                * tuple and we can continue to the next tuple.
> +                */

Won't we loop back when a concurrent update occurs that makes the new
candidate tuple not satisfy the join qual anymore? What does this have
to do with partitioning?

* Why does MergeJoin get referenced here?:

> + * Also, since the internal MergeJoin node can hide the source and target
> + * relations, we must explicitly make the respective relation as visible so
> + * that columns can be referenced unqualified from these relations.

That's a struct that has nothing to do with SQL MERGE in particular
(no more than hash join does, for example).

* This patch could use a pg_indent.

* We already heard about this code from Tomas, who found it unnecessary:

> +           /*
> +            * SQL Standard says that WHEN AND conditions must not
> +            * write to the database, so check we haven't written
> +            * any WAL during the test. Very sensible that is, since
> +            * we can end up evaluating some tests multiple times if
> +            * we have concurrent activity and complex WHEN clauses.
> +            *
> +            * XXX If we had some clear form of functional labelling
> +            * we could use that, if we trusted it.
> +            */
> +           if (startWAL < GetXactWALBytes())
> +               ereport(ERROR,
> +                       (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> +                        errmsg("cannot write to database within WHEN AND 
> condition")));

This needs to go. Apart from the fact that GetXactWALBytes() is buggy
(it returns int64 for the unsigned type XLogRecPtr), the whole idea
just seems unnecessary. I don't see why this is any different to using
a volatile function in a regular UPDATE.

* I still think we probably need to add something to ruleutils.c, so
that MERGE Query structs can be deparsed -- see get_query_def().

Yeah, we've decided we're not going to support user-visible rules, but
I continue to think that deparsing support is necessary on general
principle, and for the benefit of extensions like Citus that use
deparsing in a fairly broad way. At the very least, if we're not going
to support deparsing, there needs to be a better reason than "we're
not supporting user-visible rules".

-- 
Peter Geoghegan

Re: [HACKERS] MERGE SQL Statement for PG11

Reply via email to