Re: [HACKERS] WIP patch for multiple column assignment in UPDATE

Pavel Stehule Thu, 19 Jun 2014 00:04:16 -0700

Hello

I did some tests and It looks so it allows only some form of nested loop.


postgres=# explain (analyze, timing off, buffers) update a1 set b = (select
b from a2 where a1.a = a2.a);
                                                  QUERY
PLAN
---------------------------------------------------------------------------------------------------------------
 Update on a1  (cost=0.00..8456925.00 rows=1000000 width=10) (actual rows=0
loops=1)
   Buffers: shared hit=9017134 read=14376 dirtied=58170 written=1014
   ->  Seq Scan on a1  (cost=0.00..8456925.00 rows=1000000 width=10)
(actual rows=1000000 loops=1)
         Buffers: shared hit=4005465 read=4424 written=971
         SubPlan 1
           ->  Index Scan using a2_a_idx on a2  (cost=0.42..8.44 rows=1
width=4) (actual rows=1 loops=1000000)
                 Index Cond: (a1.a = a)
                 Buffers: shared hit=4005464
 Planning time: 0.212 ms
 Execution time: 30114.101 ms
(10 rows)

do you plan some sophisticated mechanism - like MERGE or some similar?

Regards

Pavel


2014-06-16 17:17 GMT+02:00 Tom Lane <t...@sss.pgh.pa.us>:

> Attached is a very-much-WIP patch for supporting
>          UPDATE foo SET ..., (a,b,c) = (select x,y,z from ...), ...
>
> It lacks documentation, ruleutils.c support, or any solution for the
> rule NEW.* expansion problem I mentioned Saturday.  The reason I'm
> posting it now is to get feedback about an implementation choice that
> feels a bit klugy to me; but I don't see any clearly better idea.
>
> The post-parse-analysis representation I've chosen here is that the
> output columns of the sub-select are represented by PARAM_MULTIEXEC
> Params, and the sub-select itself appears in a resjunk entry at the end
> of the targetlist; that is, the UPDATE tlist is more or less like
>
>    $1,  -- to be assigned to a
>    $2,  -- to be assigned to b
>    $3,  -- to be assigned to c
>    (select x,y,z from ...),  -- resjunk entry, value will be discarded
>
> If the sub-select is uncorrelated with the outer query, the planner
> turns it into an InitPlan, replacing the resjunk tlist item with a
> NULL constant, and then everything happens normally at execution.
>
> But more usually, the sub-select will be correlated with the outer
> query.  In this case, the subquery turns into a SubPlan tree that
> stays in the resjunk item.  At the start of execution, the ParamExecData
> structs for each of its output Params are marked with execPlan pointers
> to the subplan, just as would happen for an InitPlan.  This causes the
> subplan to get executed when the first of the output Params is evaluated;
> it loads the ParamExecData structs for all its output Params, and then
> the later Params just take data from the structs.  When execution reaches
> the MULTIEXEC SubPlan in the resjunk tlist item, no evaluation of the
> subplan is needed; but instead we re-mark all the output ParamExecData
> structs with non-null execPlan pointers, so that a fresh execution of
> the subplan will happen in the next evaluation of the targetlist.
>
> The klugy aspect of this is that it assumes that the SubPlan item will
> appear later in the tlist than all the Params referencing it.  This is
> true at the moment because resjunk tlist items always appear after
> non-resjunk ones.  There are a few other places that already depend on
> this ordering, but we've mostly tried to avoid introducing new
> dependencies on it.
>
> The alternative that I'd originally had in mind, before put-it-in-a-
> resjunk-item occurred to me, was to invent a new "secondary tlist"
> field of Query and of ModifyTable plan nodes, as I sketched back in
> http://www.postgresql.org/message-id/1783.1399054...@sss.pgh.pa.us
> We'd put the MULTIEXEC SubPlans in this secondary tlist and expect
> the executor to evaluate it just before evaluating the main tlist.
> However, that way isn't terribly pretty either, because it extends
> knowledge of this feature to a *whole lot* of places that don't have
> to know about it in the attached approach; in particular, just about
> every place that manipulates targetlist contents would have to also
> manipulate the secondary tlist.
>
> Another approach is to explicitly identify which of the Params will
> be evaluated first and replace it with a node tree that evaluates
> the subplan (and sets output Params for the remaining columns).
> This is a bit messy because the first-to-be-evaluated is dependent
> on the targetlist reordering that the planner does; so we don't want
> parse analysis to try to do it.  (If we allowed parse analysis to
> know that the planner will sort the tlist items into physical column
> order, we could do it like that; but then it would break if we ever
> get around to allowing ALTER TABLE to change the physical order.)
> We could safely have setrefs.c determine the first-to-be-evaluated
> Param, though, since it will traverse the tlist in final order.
> So if we went with this approach I'd have setrefs.c replace the first
> Param with a SubPlan node.  That seems a bit of a kluge too though.
>
> Preferences, comments, better ideas?
>
>                         regards, tom lane
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>

Re: [HACKERS] WIP patch for multiple column assignment in UPDATE

Reply via email to