Re: [HACKERS] Supporting multiple column assignment in UPDATE (9.5 project)
Merlin Moncure-2 wrote > On Sat, May 3, 2014 at 5:48 AM, Marko Tiikkaja < > marko@ > > wrote: >> On 5/2/14, 10:10 PM, Merlin Moncure wrote: >>> >>> On Fri, May 2, 2014 at 3:03 PM, Tom Lane < > tgl@.pa > > wrote: Meh. Then you could have a query that works fine until you add a column to the table, and it stops working. If nobody ever used column names identical to table names it'd be all right, but unfortunately people seem to do that a lot... >>> >>> >>> That's already the case with select statements >> >> I don't think that's true if you table-qualify your column references and >> don't use SELECT *. >> >> >>> and, if a user were >>> concerned about that, always have the option of aliasing the table as >>> nearly 100% of professional developers do: >>> >>> SELECT f FROM foo f; >>> etc. >> >> >> So e.g.: >> >> UPDATE foo f SET f = ..; >> >> would resolve to the table, despite there being a column called "f"? That >> would break backwards compatibility. >> >> How about: >> >> UPDATE foo SET ROW(foo) = (1,2,3); >> >> ISTM that this could be parsed unambiguously, though it's perhaps a bit >> ugly. > > Hm, that's a bit too ugly: row(foo) in this case means 'do special > behavior X' whereas in all other cases it means make an anonymous > rowtype with one attribute of type 'foo'. > > How about: > UPDATE foo SET (foo).* = (1,2,3); Wouldn't UPDATE foo SET (foo.*) = (1,2,3) be better since it would cleanly support non-complete types like UPDATE foo SET (foo.col1, foo.col3) = (1,3) Though I am not that concerned about overloading the use of "ROW" in context of an UPDATE. As with normal usage of ROW why not make its presence optional - support both syntaxes? Keywords like USING and SET have different meanings when used in DELETE/UPDATE so having ROW behave similarly wouldn't be that confusing - and it does seem to have an ambiguity if you restrict this interpretation of ROW to only the SET of the update statement. Is there any need or requirement for (or against) interleaving normal and row-valued, or even multiple row-valued, SET expressions? UPDATE foo SET (foo.col1, foo.col3) = (1,3), foo.col2 = 2 David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Supporting-multiple-column-assignment-in-UPDATE-9-5-project-tp5802240p5802471.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Supporting multiple column assignment in UPDATE (9.5 project)
On Mon, May 5, 2014 at 10:32 AM, Andrew Dunstan wrote: > > On 05/05/2014 11:20 AM, Pavel Stehule wrote: >> >> >> >> >> How about: >> UPDATE foo SET (foo).* = (1,2,3); >> >> >> It is looking little bit strange >> >> I like previous proposal UPDATE foo SET foo = (1,2,3); >> > > What if the table has a field called foo? Won't it then be ambiguous? See upthread: it prefers the field to the table if both are there (exactly as SELECT does). merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Supporting multiple column assignment in UPDATE (9.5 project)
On 05/05/2014 11:20 AM, Pavel Stehule wrote: How about: UPDATE foo SET (foo).* = (1,2,3); It is looking little bit strange I like previous proposal UPDATE foo SET foo = (1,2,3); What if the table has a field called foo? Won't it then be ambiguous? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Supporting multiple column assignment in UPDATE (9.5 project)
2014-05-05 17:02 GMT+02:00 Merlin Moncure : > On Sat, May 3, 2014 at 5:48 AM, Marko Tiikkaja wrote: > > On 5/2/14, 10:10 PM, Merlin Moncure wrote: > >> > >> On Fri, May 2, 2014 at 3:03 PM, Tom Lane wrote: > >>> > >>> Meh. Then you could have a query that works fine until you add a > column > >>> to the table, and it stops working. If nobody ever used column names > >>> identical to table names it'd be all right, but unfortunately people > >>> seem to do that a lot... > >> > >> > >> That's already the case with select statements > > > > I don't think that's true if you table-qualify your column references and > > don't use SELECT *. > > > > > >> and, if a user were > >> concerned about that, always have the option of aliasing the table as > >> nearly 100% of professional developers do: > >> > >> SELECT f FROM foo f; > >> etc. > > > > > > So e.g.: > > > > UPDATE foo f SET f = ..; > > > > would resolve to the table, despite there being a column called "f"? That > > would break backwards compatibility. > > > > How about: > > > > UPDATE foo SET ROW(foo) = (1,2,3); > > > > ISTM that this could be parsed unambiguously, though it's perhaps a bit > > ugly. > > Hm, that's a bit too ugly: row(foo) in this case means 'do special > behavior X' whereas in all other cases it means make an anonymous > rowtype with one attribute of type 'foo'. > > How about: > UPDATE foo SET (foo).* = (1,2,3); > It is looking little bit strange I like previous proposal UPDATE foo SET foo = (1,2,3); Pavel > > merlin > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
Re: [HACKERS] Supporting multiple column assignment in UPDATE (9.5 project)
On Sat, May 3, 2014 at 5:48 AM, Marko Tiikkaja wrote: > On 5/2/14, 10:10 PM, Merlin Moncure wrote: >> >> On Fri, May 2, 2014 at 3:03 PM, Tom Lane wrote: >>> >>> Meh. Then you could have a query that works fine until you add a column >>> to the table, and it stops working. If nobody ever used column names >>> identical to table names it'd be all right, but unfortunately people >>> seem to do that a lot... >> >> >> That's already the case with select statements > > I don't think that's true if you table-qualify your column references and > don't use SELECT *. > > >> and, if a user were >> concerned about that, always have the option of aliasing the table as >> nearly 100% of professional developers do: >> >> SELECT f FROM foo f; >> etc. > > > So e.g.: > > UPDATE foo f SET f = ..; > > would resolve to the table, despite there being a column called "f"? That > would break backwards compatibility. > > How about: > > UPDATE foo SET ROW(foo) = (1,2,3); > > ISTM that this could be parsed unambiguously, though it's perhaps a bit > ugly. Hm, that's a bit too ugly: row(foo) in this case means 'do special behavior X' whereas in all other cases it means make an anonymous rowtype with one attribute of type 'foo'. How about: UPDATE foo SET (foo).* = (1,2,3); merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Supporting multiple column assignment in UPDATE (9.5 project)
On 5/2/14, 10:10 PM, Merlin Moncure wrote: On Fri, May 2, 2014 at 3:03 PM, Tom Lane wrote: Meh. Then you could have a query that works fine until you add a column to the table, and it stops working. If nobody ever used column names identical to table names it'd be all right, but unfortunately people seem to do that a lot... That's already the case with select statements I don't think that's true if you table-qualify your column references and don't use SELECT *. and, if a user were concerned about that, always have the option of aliasing the table as nearly 100% of professional developers do: SELECT f FROM foo f; etc. So e.g.: UPDATE foo f SET f = ..; would resolve to the table, despite there being a column called "f"? That would break backwards compatibility. How about: UPDATE foo SET ROW(foo) = (1,2,3); ISTM that this could be parsed unambiguously, though it's perhaps a bit ugly. Regards, Marko Tiikkaja -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Supporting multiple column assignment in UPDATE (9.5 project)
On Fri, May 2, 2014 at 3:03 PM, Tom Lane wrote: > Merlin Moncure writes: >> On Fri, May 2, 2014 at 2:47 PM, Tom Lane wrote: >>> But I don't think your suggestions of the table name or alias work; >>> they could conflict with an actual column name. > >> Presumably it'd follow similar rules to SELECT -- resolve the column >> name in the face of ambiguity. > > Meh. Then you could have a query that works fine until you add a column > to the table, and it stops working. If nobody ever used column names > identical to table names it'd be all right, but unfortunately people > seem to do that a lot... That's already the case with select statements and, if a user were concerned about that, always have the option of aliasing the table as nearly 100% of professional developers do: SELECT f FROM foo f; etc. Now, I need this feature a lot less than I used to (although I do like the symmetry with SELECT); hstore and jsonb have matured to the point that they can handle most trigger function operations that you'd want to abstract over multiple tables without expensive calls to information_schema. The main advantages for a native approach would be type safety (although even that situation is improving at long last), performance, and code complexity. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Supporting multiple column assignment in UPDATE (9.5 project)
Merlin Moncure writes: > On Fri, May 2, 2014 at 2:47 PM, Tom Lane wrote: >> But I don't think your suggestions of the table name or alias work; >> they could conflict with an actual column name. > Presumably it'd follow similar rules to SELECT -- resolve the column > name in the face of ambiguity. Meh. Then you could have a query that works fine until you add a column to the table, and it stops working. If nobody ever used column names identical to table names it'd be all right, but unfortunately people seem to do that a lot... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Supporting multiple column assignment in UPDATE (9.5 project)
On Fri, May 2, 2014 at 2:47 PM, Tom Lane wrote: > Merlin Moncure writes: >> 2) I often wish that you could reference the table (or it's alias) >> directly as the field list. > >> UPDATE foo f set f = (...)::foo; >> or even >> UPDATE foo SET foo = foo; > > Hm. You could get there with this syntax as long as you didn't mind > writing out the field list explicitly. Arguments why you should > want to do that are the same as for avoiding "SELECT *", with maybe > a bit more urgency since at least SELECT * won't trash your data > if you get it wrong. However, assuming that that argument doesn't > impress you ... > > My draft copy of SQL99 mentions a syntax > > UPDATE table SET ROW = [ WHERE ... ] > > which does not appear in later editions of the spec, and probably wasn't > in SQL99 final either (since SQL:2003 does not mention it as a removed > feature). I'm not sure we'd want to implement that; it would require > making ROW into a fully-reserved word, which it is not today, and that > seems rather a high price for implementing a not-per-spec feature. > But I don't think your suggestions of the table name or alias work; > they could conflict with an actual column name. Presumably it'd follow similar rules to SELECT -- resolve the column name in the face of ambiguity. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Supporting multiple column assignment in UPDATE (9.5 project)
Merlin Moncure writes: > On Fri, May 2, 2014 at 1:15 PM, Tom Lane wrote: >> I've been thinking about how we might implement the > assignment> UPDATE syntax that was introduced in SQL:2003. This feature >> allows you to do >> UPDATE table SET ..., (column, column, ...) = , ... > Couple quick questions: > 1) how does this interplay with RETURNING? I guess it probably > doesn't change, but I imagine there's be no way to reference the > composite result in the RETURNING statement? Not as such; obviously you could reference the assigned-to columns in RETURNING and thereby reconstruct the composite value. > 2) I often wish that you could reference the table (or it's alias) > directly as the field list. > UPDATE foo f set f = (...)::foo; > or even > UPDATE foo SET foo = foo; Hm. You could get there with this syntax as long as you didn't mind writing out the field list explicitly. Arguments why you should want to do that are the same as for avoiding "SELECT *", with maybe a bit more urgency since at least SELECT * won't trash your data if you get it wrong. However, assuming that that argument doesn't impress you ... My draft copy of SQL99 mentions a syntax UPDATE table SET ROW = [ WHERE ... ] which does not appear in later editions of the spec, and probably wasn't in SQL99 final either (since SQL:2003 does not mention it as a removed feature). I'm not sure we'd want to implement that; it would require making ROW into a fully-reserved word, which it is not today, and that seems rather a high price for implementing a not-per-spec feature. But I don't think your suggestions of the table name or alias work; they could conflict with an actual column name. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Supporting multiple column assignment in UPDATE (9.5 project)
On Fri, May 2, 2014 at 1:15 PM, Tom Lane wrote: > I've been thinking about how we might implement the assignment> UPDATE syntax that was introduced in SQL:2003. This feature > allows you to do > > UPDATE table SET ..., (column, column, ...) = , ... > > where the system arranges to evaluate the just > once per row and then assign its fields into the specified target columns. > > Back in commit 6e8596a146c9b16f2c053ea8c6e361a114c0b65c we introduced > some limited support for this syntax, but it only handles a expression> that is a ROW() constructor, and it just does a simple > syntactic transformation of pulling apart the ROW() constructor and > building an independent assignment to each target column. The actually > interesting uses for this feature don't work with that implementation > approach. The most common case I've seen asked for is where the > expression is a sub-SELECT returning multiple columns (but at most one > row). > > As far as the parser is concerned, the main hurdle to supporting this > feature is that the representation of an UPDATE's targetlist assumes that > each list element is an independent TargetEntry representing a single > assignment. Now, there is a heck of a lot of code that knows what > targetlists look like, so I'm not eager to try to change that basic > assumption. What seems like probably a better idea is to represent > SET (target1, target2, target3) = foo > as though it were > SET target1 = fooref.col1, target2 = fooref.col2, target3 = fooref.col3 > where fooref is some Param-like reference to a separate list of > expressions that have composite outputs. It would be understood that this > separate targetlist would be evaluated just once before evaluating the > main tlist. This approach would allow all the existing targetlist > manipulation code to stay about the same. It would be a bit of a > challenge for ruleutils.c to reconstruct the original syntax when printing > an UPDATE in a rule, but I think that's just a small matter of > programming. (Possibly it would help if the elements of the separate > composite-values targetlist contained markers as to which main-tlist > elements they were for.) > > Now, we could probably implement it straightforwardly just based on that > idea, though it's not quite clear where to shoehorn evaluation of the > separate targetlist into the constructed plan. One way would be to > insert an additional evaluation level by adding a Result node on top > of the normal plan, and then have the lower level compute the composite > values as resjunk tlist elements, while the upper level does FieldSelects > from the composite values to implement the "fooref.colN" references. > > However, I'm mainly interested in the sub-SELECT case; indeed, anything > else you might want to do could be transformed into a sub-SELECT, so > I wouldn't feel bad if we just restricted the new feature to that. > And this doesn't seem like quite the right way to do it for sub-SELECTs. > > In the case of sub-SELECTs, we have almost the right execution mechanism > already, in that initPlans are capable of setting multiple PARAM_EXEC > runtime Params, one for each output column of the sub-SELECT. So what > I called "fooref.col1" etc above could just be PARAM_EXEC Params referring > to the subplan outputs --- except that initPlans are only for uncorrelated > subqueries (those without any outer references to Vars of the parent query > level). And the interesting cases for UPDATE generally involve correlated > subqueries. > > What I'm thinking about this is that we ought to make an effort to unify > the currently separate implementation paths for correlated and > uncorrelated subqueries. Instead of SubPlans in the expression tree for > correlated subqueries, I think they should all be treated much like > initPlans are now, ie, there are PARAM_EXEC Params referencing outputs > from a list of subqueries that are attached to the expression tree's > parent plan node, and we lazily evaluate the subqueries upon first use of > one of their output parameters. What would be different from the current > handling of initPlans is that each time we advance to a new input row, > we'd need to reset the evaluation state of the subqueries that are > correlated. The reason for changing it like that is so that we can have > multiple separate Params referencing different output columns of a single > correlated subquery, and be sure that we evaluate the correlated subquery > only once; the current SubPlan mechanism can't support separate references > to the same subplan. Now, this would add a small amount of new bookkeeping > overhead to use of correlated subqueries, but I find it hard to believe > that that'd be noticeable compared to the startup/shutdown cost of the > subquery. > > So, if we were to revise the handling of correlated subqueries like that, > then for the case of a that is a sub-SELECT we > wouldn't need any explicit runtime evaluation of "separate targetlist"
[HACKERS] Supporting multiple column assignment in UPDATE (9.5 project)
I've been thinking about how we might implement the UPDATE syntax that was introduced in SQL:2003. This feature allows you to do UPDATE table SET ..., (column, column, ...) = , ... where the system arranges to evaluate the just once per row and then assign its fields into the specified target columns. Back in commit 6e8596a146c9b16f2c053ea8c6e361a114c0b65c we introduced some limited support for this syntax, but it only handles a that is a ROW() constructor, and it just does a simple syntactic transformation of pulling apart the ROW() constructor and building an independent assignment to each target column. The actually interesting uses for this feature don't work with that implementation approach. The most common case I've seen asked for is where the expression is a sub-SELECT returning multiple columns (but at most one row). As far as the parser is concerned, the main hurdle to supporting this feature is that the representation of an UPDATE's targetlist assumes that each list element is an independent TargetEntry representing a single assignment. Now, there is a heck of a lot of code that knows what targetlists look like, so I'm not eager to try to change that basic assumption. What seems like probably a better idea is to represent SET (target1, target2, target3) = foo as though it were SET target1 = fooref.col1, target2 = fooref.col2, target3 = fooref.col3 where fooref is some Param-like reference to a separate list of expressions that have composite outputs. It would be understood that this separate targetlist would be evaluated just once before evaluating the main tlist. This approach would allow all the existing targetlist manipulation code to stay about the same. It would be a bit of a challenge for ruleutils.c to reconstruct the original syntax when printing an UPDATE in a rule, but I think that's just a small matter of programming. (Possibly it would help if the elements of the separate composite-values targetlist contained markers as to which main-tlist elements they were for.) Now, we could probably implement it straightforwardly just based on that idea, though it's not quite clear where to shoehorn evaluation of the separate targetlist into the constructed plan. One way would be to insert an additional evaluation level by adding a Result node on top of the normal plan, and then have the lower level compute the composite values as resjunk tlist elements, while the upper level does FieldSelects from the composite values to implement the "fooref.colN" references. However, I'm mainly interested in the sub-SELECT case; indeed, anything else you might want to do could be transformed into a sub-SELECT, so I wouldn't feel bad if we just restricted the new feature to that. And this doesn't seem like quite the right way to do it for sub-SELECTs. In the case of sub-SELECTs, we have almost the right execution mechanism already, in that initPlans are capable of setting multiple PARAM_EXEC runtime Params, one for each output column of the sub-SELECT. So what I called "fooref.col1" etc above could just be PARAM_EXEC Params referring to the subplan outputs --- except that initPlans are only for uncorrelated subqueries (those without any outer references to Vars of the parent query level). And the interesting cases for UPDATE generally involve correlated subqueries. What I'm thinking about this is that we ought to make an effort to unify the currently separate implementation paths for correlated and uncorrelated subqueries. Instead of SubPlans in the expression tree for correlated subqueries, I think they should all be treated much like initPlans are now, ie, there are PARAM_EXEC Params referencing outputs from a list of subqueries that are attached to the expression tree's parent plan node, and we lazily evaluate the subqueries upon first use of one of their output parameters. What would be different from the current handling of initPlans is that each time we advance to a new input row, we'd need to reset the evaluation state of the subqueries that are correlated. The reason for changing it like that is so that we can have multiple separate Params referencing different output columns of a single correlated subquery, and be sure that we evaluate the correlated subquery only once; the current SubPlan mechanism can't support separate references to the same subplan. Now, this would add a small amount of new bookkeeping overhead to use of correlated subqueries, but I find it hard to believe that that'd be noticeable compared to the startup/shutdown cost of the subquery. So, if we were to revise the handling of correlated subqueries like that, then for the case of a that is a sub-SELECT we wouldn't need any explicit runtime evaluation of "separate targetlist" entries. Use of Params referencing the subplan's outputs would be enough to cause evaluation to happen at the right times. At least for the first cut, I think it'd be enough to stop there, though maybe later we could deal w