Re: [HACKERS] Changed SRF in targetlist handling
Robert Haas writes: > Tom, it's been about 3.5 months since you wrote this. I think it > would be really valuable if you could get to this RSN because the > large patch set posted on the "Changed SRF in targetlist handling" > thread is backed up behind this -- and I think that's really valuable > work which I don't want to see slip out of this release. Yeah, I was busy with other stuff during the recent commitfest. I'll try to get back to this. There's still only 24 hours in a day, though. (And no, [1] is not enough to help.) regards, tom lane [1] https://www.theguardian.com/science/2016/dec/07/earths-day-lengthens-by-two-milliseconds-a-century-astronomers-find -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changed SRF in targetlist handling
On Mon, Aug 22, 2016 at 4:20 PM, Tom Lane wrote: > Andres Freund writes: >> On 2016-08-17 17:41:28 -0700, Andres Freund wrote: >>> Tom, do you think this is roughly going in the right direction? > > I've not had time to look at this patch, I'm afraid. If you still > want me to, I can make time in a day or so. Tom, it's been about 3.5 months since you wrote this. I think it would be really valuable if you could get to this RSN because the large patch set posted on the "Changed SRF in targetlist handling" thread is backed up behind this -- and I think that's really valuable work which I don't want to see slip out of this release. At the same time, both that and this are quite invasive, and I don't want it all to get committed the day before feature freeze, because that will mess up the schedule. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changed SRF in targetlist handling
On Wed, Aug 24, 2016 at 3:55 AM, Andres Freund wrote: > Comments? This thread has no activity for some time now and it is linked to this CF entry: https://commitfest.postgresql.org/10/759/ I am marking it as returned with feedback.. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changed SRF in targetlist handling
On 2016-08-17 17:41:28 -0700, Andres Freund wrote: > a) Allow to avoid using a tuplestore for SRF_PERCALL SRFs in ROWS FROM - >otherwise our performance would regress noticeably in some cases. To demonstrate the problem: master: =# COPY (SELECT generate_series(1, 5000)) TO '/dev/null'; COPY 5000 Time: 6859.830 ms =# COPY (SELECT * FROM generate_series(1, 5000)) TO '/dev/null'; COPY 5000 Time: 11314.507 ms getting rid of the materialization indeed fixes the problem: dev: =# COPY (SELECT generate_series(1, 5000)) TO '/dev/null'; COPY 5000 Time: 5757.547 ms =# COPY (SELECT * FROM generate_series(1, 5000)) TO '/dev/null'; COPY 5000 Time: 5842.524 ms I've currently implemented this by having nodeFunctionscan.c store enough state in FunctionScanPerFuncState to continue the ValuePerCall protocol. That all seems to work well, without big problems. The open issue here is whether / how we want to deal with EXEC_FLAG_REWIND and EXEC_FLAG_BACKWARD. Currently that, with some added complications, is implemented in nodeFunctionscan.c itself. But for ValuePerCall SRFs that doesn't directly work anymore. ISTM that the easiest way here is actually to rip out support for EXEC_FLAG_REWIND/EXEC_FLAG_BACKWARD from nodeFunctionscan.c. If the plan requires that, the planner will slap a Material node on-top. Which will even be more efficient when ROWS FROM for multiple SRFs, or WITH ORDINALITY, are used. Alternatively we can continue to create a tuplestore for ValuePerCall when eflags indicates that's required. But personally I don't see that as an advantageous course. Comments? Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changed SRF in targetlist handling
Hi, On 2016-08-22 16:20:58 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2016-08-17 17:41:28 -0700, Andres Freund wrote: > >> Tom, do you think this is roughly going in the right direction? > > I've not had time to look at this patch, I'm afraid. If you still > want me to, I can make time in a day or so. That'd greatly be appreciated. I think polishing the POC up to committable patch will be a considerable amount of work, and I'd like design feedback before that. > > I'm working on these. Atm ExecMakeTableFunctionResult() resides in > > execQual.c - I'm inlining it into nodeFunctionscan.c now, because > > there's no other callers, and having it separate seems to bring no > > benefit. > > > Please speak soon up if you disagree. > > I think ExecMakeTableFunctionResult was placed in execQual.c because > it seemed to belong there alongside the support for SRFs in tlists. > If that's going away then there's no good reason not to move the logic > to where it's used. Cool, then we agree. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changed SRF in targetlist handling
Andres Freund writes: > On 2016-08-17 17:41:28 -0700, Andres Freund wrote: >> Tom, do you think this is roughly going in the right direction? I've not had time to look at this patch, I'm afraid. If you still want me to, I can make time in a day or so. > I'm working on these. Atm ExecMakeTableFunctionResult() resides in > execQual.c - I'm inlining it into nodeFunctionscan.c now, because > there's no other callers, and having it separate seems to bring no > benefit. > Please speak soon up if you disagree. I think ExecMakeTableFunctionResult was placed in execQual.c because it seemed to belong there alongside the support for SRFs in tlists. If that's going away then there's no good reason not to move the logic to where it's used. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changed SRF in targetlist handling
Hi, On 2016-05-23 09:26:03 +0800, Craig Ringer wrote: > SRFs-in-tlist are a lot faster for lockstep iteration etc. They're also > much simpler to write, though if the result result rowcount differs > unexpectedly between the functions you get exciting and unexpected > behaviour. > > WITH ORDINALITY provides what I think is the last of the functionality > needed to replace SRFs-in-from, but at a syntatactic complexity and > performance cost. The following example demonstrates that, though it > doesn't do anything that needs LATERAL etc. I'm aware the following aren't > semantically identical if the rowcounts differ. I think here you're just missing ROWS FROM (generate_series(..), generate_series(...)) Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changed SRF in targetlist handling
On 2016-08-17 17:41:28 -0700, Andres Freund wrote: > Tom, do you think this is roughly going in the right direction? My plan > here is to develop two patches, to come before this: > > a) Allow to avoid using a tuplestore for SRF_PERCALL SRFs in ROWS FROM - >otherwise our performance would regress noticeably in some cases. > b) Allow ROWS FROM() to return SETOF RECORD type SRFs as one column, >instead of expanded. That's important to be able move SETOF RECORD >returning functions in the targetlist into ROWS FROM, which otherwise >requires an explicit column list. I'm working on these. Atm ExecMakeTableFunctionResult() resides in execQual.c - I'm inlining it into nodeFunctionscan.c now, because there's no other callers, and having it separate seems to bring no benefit. Please speak soon up if you disagree. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changed SRF in targetlist handling
On 2016-08-03 20:22:03 -0700, Andres Freund wrote: > On 2016-08-02 16:30:55 -0700, Andres Freund wrote: > > > > Besides that I'm structurally wondering whether turning the original > > > > query into a subquery is the right thing to do. It requires some kind of > > > > ugly munching of Query->*, and has the above problem. > > > > > > It does not seem like it should be that hard, certainly no worse than > > > subquery pullup. Want to show code? > > > > It's not super hard, there's some stuff like pushing/not-pushing > > various sortgrouprefs to the subquery. But I think we can live with it. > > > > Let me clean up the code some, hope to have something today or > > tomorrow. > > Here we go. This *clearly* is a POC, not more. But it mostly works. > > > 0001 - adds some test, some of those change after the later patches > 0002 - main SRF via ROWS FROM () implementation > 0003 - Large patch removing now unused code. Most satisfying. > > > The interesting bit is obviously 0002. What it basically does is, at the > beginning > of subquery_planner(): > 1) unsrfify: >move the jointree into a subquery > 2) unsrfify_reference_subquery_mutator: >process the old targetlist to reference the new subquery. If a >TargetEntry doesn't contain a set, it's entirely moved into the >subquery. Otherwise all Vars/Aggrefs/... it references are moved to >the subquery, and referenced in the outer query's target list. > 3) unsrfify_implement_srfs_mutator: >Replace set returning functions in the targetlist with references to >a new FUNCTION RTE. All non-nested tSRFs are part of the same RTE >(i.e. the least common multiple behaviour is gone). all tSRFs in >arguments are implemented as another FUNCTION RTE. > > I discovered that we allow SRFs in UPDATE target lists. It's not clear > to me what that's supposed to mean. Nor how exactly to implement that, > given expand_targetlist(). Right now that fails with the patch, because > it re-inserts Var's for the relation replaced by the subquery. > > Note that I've not bothered to fix up the regression test output - I'm > certain that explain output and such will still change. > > Biggest questions / tasks: > * General approach > * DML handling > * Operator implementation > * SETOF record handling > * correct handling of lateral dependency from RTE to subquery to force > evaluation order, instead of my RangeTblEntry->deps hack. > * lot of cleanup > > Comments? Tom, do you think this is roughly going in the right direction? My plan here is to develop two patches, to come before this: a) Allow to avoid using a tuplestore for SRF_PERCALL SRFs in ROWS FROM - otherwise our performance would regress noticeably in some cases. b) Allow ROWS FROM() to return SETOF RECORD type SRFs as one column, instead of expanded. That's important to be able move SETOF RECORD returning functions in the targetlist into ROWS FROM, which otherwise requires an explicit column list. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changed SRF in targetlist handling
On 2016-08-02 19:02:38 -0400, Tom Lane wrote: > Andres Freund writes: > > I've an implementation that > > > 1) turns all targetlist SRF (tSRF from now on) into ROWS FROM > >expressions. If there's tSRFs in the argument of a tSRF those becomes > >a separate, lateral, ROWS FROM expression. > > > 2) If grouping/window functions are present, the entire query is wrapped > >in a subquery RTE, except for the set-returning function. All > >referenced Var|Aggref|GroupingFunc|WindowFunc|Param nodes in the > >original targetlist are made to reference that subquery, which gets a > >TargetEntry for them. > > FWIW, I'd be inclined to do the subquery RTE all the time, Yea, that's what I ended up doing. > adding some > optimization fence to ensure it doesn't get folded back. That fixes > your problem here: > > So far I have one problem without an easy solution: Historically queries > > like > > =# SELECT id, generate_series(1,2) FROM (VALUES(1),(2)) few(id); > > ┌┬─┐ > > │ id │ generate_series │ > > ├┼─┤ > > │ 1 │ 1 │ > > │ 1 │ 2 │ > > │ 2 │ 1 │ > > │ 2 │ 2 │ > > └┴─┘ > > have preserved the SRF output ordering. But by turning the SRF into a > > ROWS FROM, there's no guarantee that the cross join between "few" and > > generate_series(1,3) above is implemented in that order. But I don't see how that fixes the above problem? The join, on the top-level because of aggregates, still can be implemented as subquery join srf or as srf join subquery, with the different output order that implies. I've duct-taped together a solution for that, by forcing the lateral machinery to always see a dependency from the SRF to the subquery; but that probably needs a nicer fix than a RangeTblEntry->deps field which is processed in extract_lateral_references() ;) > > Besides that I'm structurally wondering whether turning the original > > query into a subquery is the right thing to do. It requires some kind of > > ugly munching of Query->*, and has the above problem. > > It does not seem like it should be that hard, certainly no worse than > subquery pullup. Want to show code? It's not super hard, there's some stuff like pushing/not-pushing various sortgrouprefs to the subquery. But I think we can live with it. Let me clean up the code some, hope to have something today or tomorrow. > > An alternative approach would be to do this during parse-analysis, but I > > think that might end up being confusing, because stored rules would > > suddenly have a noticeably different structure, and it'd tie us a lot > > more to the details of that transformation than I'd like. > > -1 on that; we do not want this transformation visible in stored rules. Agreed. > > Besides the removal of the least-common-multiple behaviour of tSRF queries, > > there's some other consequences that using function scans have: > > Previously if a tSRF was never evaluated, it didn't cause the number of > > rows from being increased. E.g. > > SELECT id, COALESCE(1, generate_series(1,2)) FROM (VALUES(1),(2)) few(id); > > only produced two rows. But using joins means that a simple > > implementation of using ROWS FROM returns four rows. > > Hmm. I don't mind changing behavior in that sort of corner case. > If we're prepared to discard the LCM behavior, this seems at least > an order of magnitude less likely to be worth worrying about. I think it's fine, and potentially less confusing. > Would it be worth detecting SRFs below CASE/COALESCE/etc and throwing > an error? It would be easier to sell throwing an error than silently > changing behavior, I think. Hm. We could, but I think the new behaviour would actually make sense in the long run. Interpreting the coalesce to run on the output of the SRF doesn't seem bad to me. I found another edgecase, which we need to make a decision about. 'record' returning SRFs can't be transformed easily into a ROWS FROM. Consider e.g. the following from the regression tests: create function array_to_set(anyarray) returns setof record as $$ select i AS "index", $1[i] AS "value" from generate_subscripts($1, 1) i $$ language sql strict immutable; select array_to_set(array['one', 'two']); ┌──┐ │ array_to_set │ ├──┤ │ (1,one) │ │ (2,two) │ └──┘ (2 rows) which currently works. That currently can't be modeled as ROWS FROM() directly, because that desperately wants to return the columns as columns, which we can't do for 'record' returning things, because they don't have defined columns. For composite returning SRFs I've currently implemented that by generating a ROWS() expression, but that doesn't work for record. So it seems like we need some, not necessarily user exposed, way of making nodeFunctionscan.c return the return value as one datum. One way, as suggested by Andrew G. on IRC, was to interpret empty column definition in ROWS FROM
Re: [HACKERS] Changed SRF in targetlist handling
Andres Freund writes: > I've an implementation that > 1) turns all targetlist SRF (tSRF from now on) into ROWS FROM >expressions. If there's tSRFs in the argument of a tSRF those becomes >a separate, lateral, ROWS FROM expression. > 2) If grouping/window functions are present, the entire query is wrapped >in a subquery RTE, except for the set-returning function. All >referenced Var|Aggref|GroupingFunc|WindowFunc|Param nodes in the >original targetlist are made to reference that subquery, which gets a >TargetEntry for them. FWIW, I'd be inclined to do the subquery RTE all the time, adding some optimization fence to ensure it doesn't get folded back. That fixes your problem here: > So far I have one problem without an easy solution: Historically queries > like > =# SELECT id, generate_series(1,2) FROM (VALUES(1),(2)) few(id); > ââââââ¬ââââââââââââââââââ > â id â generate_series â > ââââââ¼âââââââââââââââââ⤠> â 1 â 1 â > â 1 â 2 â > â 2 â 1 â > â 2 â 2 â > ââââââ´ââââââââââââââââââ > have preserved the SRF output ordering. But by turning the SRF into a > ROWS FROM, there's no guarantee that the cross join between "few" and > generate_series(1,3) above is implemented in that order. > Besides that I'm structurally wondering whether turning the original > query into a subquery is the right thing to do. It requires some kind of > ugly munching of Query->*, and has the above problem. It does not seem like it should be that hard, certainly no worse than subquery pullup. Want to show code? > An alternative approach would be to do this during parse-analysis, but I > think that might end up being confusing, because stored rules would > suddenly have a noticeably different structure, and it'd tie us a lot > more to the details of that transformation than I'd like. -1 on that; we do not want this transformation visible in stored rules. > Besides the removal of the least-common-multiple behaviour of tSRF queries, > there's some other consequences that using function scans have: > Previously if a tSRF was never evaluated, it didn't cause the number of > rows from being increased. E.g. > SELECT id, COALESCE(1, generate_series(1,2)) FROM (VALUES(1),(2)) few(id); > only produced two rows. But using joins means that a simple > implementation of using ROWS FROM returns four rows. Hmm. I don't mind changing behavior in that sort of corner case. If we're prepared to discard the LCM behavior, this seems at least an order of magnitude less likely to be worth worrying about. Having said that, I do seem to recall a bug report about misbehavior when a SRF was present in just one arm of a CASE statement. That would have the same type of behavior as you describe here, and evidently there's at least one person out there depending on it. Would it be worth detecting SRFs below CASE/COALESCE/etc and throwing an error? It would be easier to sell throwing an error than silently changing behavior, I think. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changed SRF in targetlist handling
On 2016-05-25 16:55:23 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2016-05-25 15:20:03 -0400, Tom Lane wrote: > >> We could certainly make a variant behavior in nodeFunctionscan.c that > >> emulates that, if we feel that being exactly bug-compatible on the point > >> is actually what we want. I'm dubious about that though, not least > >> because I don't think *anyone* actually believes that that behavior isn't > >> broken. Did you read my upthread message suggesting assorted compromise > >> choices? > > > You mean > > https://www.postgresql.org/message-id/21076.1464034...@sss.pgh.pa.us ? > > If so, yes. > > > If we go with rewriting this into LATERAL, I'd vote for 2.5 (trailed by > > option 1), that'd keep most of the functionality, and would break > > visibly rather than invisibly in the cases where not. > > 2.5 would be okay with me. > > > I guess you're not planning to work on this? > > Well, not right now, as it's clearly too late for 9.6. I might hack on > it later if nobody beats me to it. FWIW, as it's blocking my plans for executor related rework (expression evaluation, batch processing) I started to hack on this. I've an implementation that 1) turns all targetlist SRF (tSRF from now on) into ROWS FROM expressions. If there's tSRFs in the argument of a tSRF those becomes a separate, lateral, ROWS FROM expression. 2) If grouping/window functions are present, the entire query is wrapped in a subquery RTE, except for the set-returning function. All referenced Var|Aggref|GroupingFunc|WindowFunc|Param nodes in the original targetlist are made to reference that subquery, which gets a TargetEntry for them. 3) If sortClause does *not* reference any tSRFs the sorting is evaluated in a subquery, to preserve the output ordering of SRFs in queries like SELECT id, generate_series(1,3) FROM (VALUES(1),(2)) d(id) ORDER BY id DESC; if in contrast sortClause does reference the tSRF output, it's evaluated in the outer SRF. this seems to generally work, and allows to remove considerable amounts of code. So far I have one problem without an easy solution: Historically queries like =# SELECT id, generate_series(1,2) FROM (VALUES(1),(2)) few(id); ┌┬─┐ │ id │ generate_series │ ├┼─┤ │ 1 │ 1 │ │ 1 │ 2 │ │ 2 │ 1 │ │ 2 │ 2 │ └┴─┘ have preserved the SRF output ordering. But by turning the SRF into a ROWS FROM, there's no guarantee that the cross join between "few" and generate_series(1,3) above is implemented in that order. I.e. we can get something like ┌┬─┐ │ id │ generate_series │ ├┼─┤ │ 1 │ 1 │ │ 2 │ 1 │ │ 1 │ 2 │ │ 2 │ 2 │ └┴─┘ because it's implemented as ┌──┐ │ QUERY PLAN │ ├──┤ │ Nested Loop (cost=0.00..35.03 rows=2000 width=8)│ │ -> Function Scan on generate_series (cost=0.00..10.00 rows=1000 width=4) │ │ -> Materialize (cost=0.00..0.04 rows=2 width=4) │ │ -> Values Scan on "*VALUES*" (cost=0.00..0.03 rows=2 width=4) │ └──┘ I right now see no easy and nice-ish way to constrain that. Besides that I'm structurally wondering whether turning the original query into a subquery is the right thing to do. It requires some kind of ugly munching of Query->*, and has the above problem. One alternative would be to instead perform the necessary magic in grouping_planner(), by "manually" adding nestloop joins before/after create_ordered_paths() (depending on SRFs being referenced in the sort clause). That'd create plans we'd not have created so far, by layering NestLoop and FunctionScan nodes above the normal query - that'd allow us to to easily force the ordering of SRF evaluation. If we go the subquery route, I'm wondering about where to tackle the restructuring. So far I'm doing it very early in subquery_planner() - otherwise the aggregation/sorting/... behaviour is easier to handle. Perhaps doing it in standard_planner() itself would be better though. An alternative approach would be to do this during parse-analysis, but I think that might end up being confusing, because stored rules would suddenly have a noticeably different structure, and it'd tie us a lot more to the details of that transformation than I'd like. Besides the removal of the least-common-multiple behaviour of tSRF queries, there's some other consequences that using function scans have: Previously if a tSRF was never evaluated, it didn't cause the number of rows from being increased. E.g. SELECT id, C
Re: [HACKERS] Changed SRF in targetlist handling
On Mon, Jun 6, 2016 at 3:26 PM, Tom Lane wrote: > Robert Haas writes: >> ... I guess I'd prefer #2 to #2.5, #2.5 to #3, and #3 to #1. >> I really don't like #1 much - I think I'd almost rather do nothing. > > FWIW, that's about my evaluation of the alternatives as well. I fear > that #1 would get a lot of pushback. If we think that something like > "LATERAL ROWS FROM STRICT" is worth having on its own merits, then > doing #2.5 seems worthwhile to me, but otherwise I'm just as happy > with #2. David J. seems to feel that throwing an error (as in #2.5) > rather than silently behaving incompatibly (as in #2) is important, > but I'm not convinced. In a green field I think we'd prefer #2 over > #2.5, so I'd rather go that direction. Same here. That behavior is actually potentially quite useful, right? Like, you might want to rely on the NULL-extension thing, if it were documented as behavior you can count on? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changed SRF in targetlist handling
On Mon, Jun 6, 2016 at 3:26 PM, Tom Lane wrote: > Robert Haas writes: > > ... I guess I'd prefer #2 to #2.5, #2.5 to #3, and #3 to #1. > > I really don't like #1 much - I think I'd almost rather do nothing. > > FWIW, that's about my evaluation of the alternatives as well. I fear > that #1 would get a lot of pushback. If we think that something like > "LATERAL ROWS FROM STRICT" is worth having on its own merits, then > doing #2.5 seems worthwhile to me, but otherwise I'm just as happy > with #2. David J. seems to feel that throwing an error (as in #2.5) > rather than silently behaving incompatibly (as in #2) is important, > but I'm not convinced. In a green field I think we'd prefer #2 over > #2.5, so I'd rather go that direction. > I suspect the decision to error or not is a one or two line change in whatever form the final patch takes. It seems like approach #2 is acceptable on a theoretical level which implies there is no desire to make the existing LCM behavior available post-patch. Assuming it is simple then everyone will have a chance to make their opinion known on whether the 2.0 or 2.5 variation is preferable for the final commit. If a decision needs to be made sooner due to a design decision I'd hope the author of the patch would make that known so we can bring this to resolution at that point instead. David J.
Re: [HACKERS] Changed SRF in targetlist handling
Robert Haas writes: > ... I guess I'd prefer #2 to #2.5, #2.5 to #3, and #3 to #1. > I really don't like #1 much - I think I'd almost rather do nothing. FWIW, that's about my evaluation of the alternatives as well. I fear that #1 would get a lot of pushback. If we think that something like "LATERAL ROWS FROM STRICT" is worth having on its own merits, then doing #2.5 seems worthwhile to me, but otherwise I'm just as happy with #2. David J. seems to feel that throwing an error (as in #2.5) rather than silently behaving incompatibly (as in #2) is important, but I'm not convinced. In a green field I think we'd prefer #2 over #2.5, so I'd rather go that direction. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changed SRF in targetlist handling
On Mon, Jun 6, 2016 at 2:53 PM, Tom Lane wrote: > Now, if we decide to try to rewrite tlist SRFs as LATERAL, it would likely > behoove us to do that rewrite before expanding * not after, so that we can > eliminate the multiple evaluation of foo() that happens currently. (That > makes it a parser problem not a planner problem.) And maybe we should > rewrite non-SRF composite-returning functions this way too, because people > have definitely complained about the extra evaluations in that context. > But my point here is that lockstep evaluation does have practical use > when the SRFs are iterating over matching collections of generated rows. > And that seems like a pretty common use-case. Yeah, OK. I'm not terribly opposed to going that way. I think the current behavior sucks badly enough - both because the semantics are bizarre and because it complicates the whole executor for a niche feature - that it's worth taking a backward compatibility hit to change it. I guess I'd prefer #2 to #2.5, #2.5 to #3, and #3 to #1. I really don't like #1 much - I think I'd almost rather do nothing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changed SRF in targetlist handling
Alvaro Herrera writes: > Robert Haas wrote: >> On Mon, Jun 6, 2016 at 11:50 AM, Tom Lane wrote: >>> No, because then you get the cross-product of multiple SRFs, not the >>> run-in-lockstep behavior. >> Oh. I assumed that was the expected behavior. But, ah, what do I know? > Lots, I assume -- but in this case, probably next to nothing, just like > most of us, because what sane person or application would be really > relying on the wacko historical behavior, in order to generate some > collective knowledge? However, I think that it is possible that > someone, somewhere has two SRFs-in-targetlist that return the same > number of rows and that the current implementation works fine for them; Yes. Run-in-lockstep is an extremely useful behavior, so much so that we made a LATERAL variant for it. I do not see a reason to break such cases in the targetlist. > My vote is to raise an error in the case of more than one SRF in targetlist. Note that that risks breaking cases that the user does not think are "more than one SRF". Consider this example using a regression-test table: regression=# create function foo() returns setof int8_tbl as regression-# 'select * from int8_tbl' language sql; CREATE FUNCTION regression=# select foo(); foo -- (123,456) (123,4567890123456789) (4567890123456789,123) (4567890123456789,4567890123456789) (4567890123456789,-4567890123456789) (5 rows) regression=# explain verbose select foo(); QUERY PLAN -- Result (cost=0.00..5.25 rows=1000 width=32) Output: foo() (2 rows) regression=# select (foo()).*; q1|q2 --+--- 123 | 456 123 | 4567890123456789 4567890123456789 | 123 4567890123456789 | 4567890123456789 4567890123456789 | -4567890123456789 (5 rows) regression=# explain verbose select (foo()).*; QUERY PLAN -- Result (cost=0.00..5.50 rows=1000 width=16) Output: (foo()).q1, (foo()).q2 (2 rows) The reason we can get away with this simplistic treatment of composite-returning SRFs is precisely the run-in-lockstep behavior. Otherwise the second query would have returned 25 rows. Now, if we decide to try to rewrite tlist SRFs as LATERAL, it would likely behoove us to do that rewrite before expanding * not after, so that we can eliminate the multiple evaluation of foo() that happens currently. (That makes it a parser problem not a planner problem.) And maybe we should rewrite non-SRF composite-returning functions this way too, because people have definitely complained about the extra evaluations in that context. But my point here is that lockstep evaluation does have practical use when the SRFs are iterating over matching collections of generated rows. And that seems like a pretty common use-case. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changed SRF in targetlist handling
On Mon, Jun 6, 2016 at 2:31 PM, Alvaro Herrera wrote: > Robert Haas wrote: > > On Mon, Jun 6, 2016 at 11:50 AM, Tom Lane wrote: > > > Robert Haas writes: > > >> On Mon, May 23, 2016 at 4:15 PM, Tom Lane wrote: > > >>> 2. Rewrite into LATERAL ROWS FROM (srf1(), srf2(), ...). This would > > >>> have the same behavior as before if the SRFs all return the same > number > > >>> of rows, and otherwise would behave differently. > > > > > >> I thought the idea was to rewrite it as LATERAL ROWS FROM (srf1()), > > >> LATERAL ROWS FROM (srf2()), ... > > > > > > No, because then you get the cross-product of multiple SRFs, not the > > > run-in-lockstep behavior. > > > > Oh. I assumed that was the expected behavior. But, ah, what do I know? > > Lots, I assume -- but in this case, probably next to nothing, just like > most of us, because what sane person or application would be really > relying on the wacko historical behavior, in order to generate some > collective knowledge? However, I think that it is possible that > someone, somewhere has two SRFs-in-targetlist that return the same > number of rows and that the current implementation works fine for them; > if we redefine it to work differently, we would break their application > silently, which seems a worse problem than breaking it noisily while > providing an easy way forward (which is to move SRFs to the FROM list) > > My vote is to raise an error in the case of more than one SRF in > targetlist. > As long as someone is willing to put in the effort we can make a subset of these multiple-SRFs-in-targetlist queries work without any change in the tabular output, though the processing mechanism might change. Your vote is essentially #1 up-thread which seems the most draconian. Assuming a viable option 2.5 or 3 solution is presented would you vote against it being committed? If so I'd like to understand why. I see #1 as basically OK only if their are technical barriers we cannot overcome - including performance. Link to the definition of the various options Tom proposed: https://www.postgresql.org/message-id/21076.1464034513%40sss.pgh.pa.us David J.
Re: [HACKERS] Changed SRF in targetlist handling
Robert Haas wrote: > On Mon, Jun 6, 2016 at 11:50 AM, Tom Lane wrote: > > Robert Haas writes: > >> On Mon, May 23, 2016 at 4:15 PM, Tom Lane wrote: > >>> 2. Rewrite into LATERAL ROWS FROM (srf1(), srf2(), ...). This would > >>> have the same behavior as before if the SRFs all return the same number > >>> of rows, and otherwise would behave differently. > > > >> I thought the idea was to rewrite it as LATERAL ROWS FROM (srf1()), > >> LATERAL ROWS FROM (srf2()), ... > > > > No, because then you get the cross-product of multiple SRFs, not the > > run-in-lockstep behavior. > > Oh. I assumed that was the expected behavior. But, ah, what do I know? Lots, I assume -- but in this case, probably next to nothing, just like most of us, because what sane person or application would be really relying on the wacko historical behavior, in order to generate some collective knowledge? However, I think that it is possible that someone, somewhere has two SRFs-in-targetlist that return the same number of rows and that the current implementation works fine for them; if we redefine it to work differently, we would break their application silently, which seems a worse problem than breaking it noisily while providing an easy way forward (which is to move SRFs to the FROM list) My vote is to raise an error in the case of more than one SRF in targetlist. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changed SRF in targetlist handling
On Mon, Jun 6, 2016 at 11:50 AM, Tom Lane wrote: > Robert Haas writes: >> On Mon, May 23, 2016 at 4:15 PM, Tom Lane wrote: >>> 2. Rewrite into LATERAL ROWS FROM (srf1(), srf2(), ...). This would >>> have the same behavior as before if the SRFs all return the same number >>> of rows, and otherwise would behave differently. > >> I thought the idea was to rewrite it as LATERAL ROWS FROM (srf1()), >> LATERAL ROWS FROM (srf2()), ... > > No, because then you get the cross-product of multiple SRFs, not the > run-in-lockstep behavior. Oh. I assumed that was the expected behavior. But, ah, what do I know? >> The rewrite you propose here seems to NULL-pad rows after the first >> SRF is exhausted: > > Yes. That's why I said it's not compatible if the SRFs don't all return > the same number of rows. It seems like a reasonable definition to me > though, certainly much more reasonable than the current run-until-LCM > behavior. I can't argue with that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changed SRF in targetlist handling
"David G. Johnston" writes: > If the SRFs return a different number of rows the LCM behavior kicks in and > you get Robert's second result. Only if the periods of the SRFs are relatively prime. That is, neither of his examples demonstrate the full weirdness of the current behavior; for that, you need periods that are multiples of each other. For instance: SELECT generate_series(1, 2), generate_series(1, 4); generate_series | generate_series -+- 1 | 1 2 | 2 1 | 3 2 | 4 (4 rows) That doesn't comport with any behavior available from LATERAL. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changed SRF in targetlist handling
On 06/06/16 18:30, David G. Johnston wrote: > To clarify, the present behavior is basically a combination of both of > Robert's results. > > If the SRFs return the same number of rows the first (zippered) result > is returned without an NULL padding. > > If the SRFs return a different number of rows the LCM behavior kicks in > and you get Robert's second result. No. > SELECT generate_series(1, 4), generate_series(1, 4) ORDER BY 1, 2; > is the same as > SELECT * FROM ROWS FROM ( generate_series(1, 4), generate_series(1, 4) ); > > BUT > > SELECT generate_series(1, 3), generate_series(1, 4) ORDER BY 1, 2; > is the same as > SELECT * FROM ROWS FROM generate_series(1, 3) a, LATERAL ROWS FROM > generate_series(1, 4) b; What would you do with: SELECT generate_series(1, 3), generate_series(1, 6); ? > Tom's 2.5 proposal basically says we make the former equivalence succeed > and have the later one fail. > > The rewrite would be unaware of the cardinality of the SRF and so it > cannot conditionally rewrite the query. One of the two must be chosen > and the incompatible behavior turned into an error. -- Vik Fearing +33 6 46 75 15 36 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changed SRF in targetlist handling
On Mon, Jun 6, 2016 at 11:50 AM, Tom Lane wrote: > Robert Haas writes: > > On Mon, May 23, 2016 at 4:15 PM, Tom Lane wrote: > >> 2. Rewrite into LATERAL ROWS FROM (srf1(), srf2(), ...). This would > >> have the same behavior as before if the SRFs all return the same number > >> of rows, and otherwise would behave differently. > > > I thought the idea was to rewrite it as LATERAL ROWS FROM (srf1()), > > LATERAL ROWS FROM (srf2()), ... > > No, because then you get the cross-product of multiple SRFs, not the > run-in-lockstep behavior. > > > The rewrite you propose here seems to NULL-pad rows after the first > > SRF is exhausted: > > Yes. That's why I said it's not compatible if the SRFs don't all return > the same number of rows. It seems like a reasonable definition to me > though, certainly much more reasonable than the current run-until-LCM > behavior. > IOW, this is why this mode query has to fail. > > > The latter is how I'd expect SRF-in-targetlist to work. > > That's not even close to how it works now. It would break *every* > existing application that has multiple SRFs in the tlist, not just > the ones whose SRFs return different numbers of rows. And I'm not > convinced that it's a more useful behavior. > To clarify, the present behavior is basically a combination of both of Robert's results. If the SRFs return the same number of rows the first (zippered) result is returned without an NULL padding. If the SRFs return a different number of rows the LCM behavior kicks in and you get Robert's second result. SELECT generate_series(1, 4), generate_series(1, 4) ORDER BY 1, 2; is the same as SELECT * FROM ROWS FROM ( generate_series(1, 4), generate_series(1, 4) ); BUT SELECT generate_series(1, 3), generate_series(1, 4) ORDER BY 1, 2; is the same as SELECT * FROM ROWS FROM generate_series(1, 3) a, LATERAL ROWS FROM generate_series(1, 4) b; Tom's 2.5 proposal basically says we make the former equivalence succeed and have the later one fail. The rewrite would be unaware of the cardinality of the SRF and so it cannot conditionally rewrite the query. One of the two must be chosen and the incompatible behavior turned into an error. David J.
Re: [HACKERS] Changed SRF in targetlist handling
Robert Haas writes: > On Mon, May 23, 2016 at 4:15 PM, Tom Lane wrote: >> 2. Rewrite into LATERAL ROWS FROM (srf1(), srf2(), ...). This would >> have the same behavior as before if the SRFs all return the same number >> of rows, and otherwise would behave differently. > I thought the idea was to rewrite it as LATERAL ROWS FROM (srf1()), > LATERAL ROWS FROM (srf2()), ... No, because then you get the cross-product of multiple SRFs, not the run-in-lockstep behavior. > The rewrite you propose here seems to NULL-pad rows after the first > SRF is exhausted: Yes. That's why I said it's not compatible if the SRFs don't all return the same number of rows. It seems like a reasonable definition to me though, certainly much more reasonable than the current run-until-LCM behavior. > The latter is how I'd expect SRF-in-targetlist to work. That's not even close to how it works now. It would break *every* existing application that has multiple SRFs in the tlist, not just the ones whose SRFs return different numbers of rows. And I'm not convinced that it's a more useful behavior. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changed SRF in targetlist handling
On Mon, May 23, 2016 at 4:15 PM, Tom Lane wrote: > We should consider single and multiple SRFs in a targetlist as distinct > use-cases; only the latter has got weird properties. > > There are several things we could potentially do with multiple SRFs in > the same targetlist. In increasing order of backwards compatibility and > effort required: > > 1. Throw error if there's more than one SRF. > > 2. Rewrite into LATERAL ROWS FROM (srf1(), srf2(), ...). This would > have the same behavior as before if the SRFs all return the same number > of rows, and otherwise would behave differently. I thought the idea was to rewrite it as LATERAL ROWS FROM (srf1()), LATERAL ROWS FROM (srf2()), ... The rewrite you propose here seems to NULL-pad rows after the first SRF is exhausted: rhaas=# select * from dual, lateral rows from (generate_series(1,3), generate_series(1,4)); x | generate_series | generate_series ---+-+- dummy | 1 | 1 dummy | 2 | 2 dummy | 3 | 3 dummy | | 4 (4 rows) ...whereas with a separate LATERAL clause for each row you get this: rhaas=# select * from dual, lateral rows from (generate_series(1,3)) a, lateral rows from (generate_series(1,4)) b; x | a | b ---+---+--- dummy | 1 | 1 dummy | 1 | 2 dummy | 1 | 3 dummy | 1 | 4 dummy | 2 | 1 dummy | 2 | 2 dummy | 2 | 3 dummy | 2 | 4 dummy | 3 | 1 dummy | 3 | 2 dummy | 3 | 3 dummy | 3 | 4 (12 rows) The latter is how I'd expect SRF-in-targetlist to work. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changed SRF in targetlist handling
"David G. Johnston" writes: > On Friday, June 3, 2016, Tom Lane > wrote: >> Merlin Moncure writes: >>> another interesting case today is: >>> create sequence s; >>> select generate_series(1,nextval('s')), generate_series(1,nextval('s')); > If taking the 2.5 approach this one would fail as opposed to being > rewritten. Well, it'd be rewritten and then would fail at runtime because of the SRF calls not producing the same number of rows. But even option #3 would not be strictly bug-compatible because it would (I imagine) evaluate the arguments of each SRF only once. The reason this case doesn't terminate in the current implementation is that it re-evaluates the SRF arguments each time we start a SRF over. That's just weird ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Changed SRF in targetlist handling
On Friday, June 3, 2016, Tom Lane > wrote: > Merlin Moncure writes: > > On Wed, May 25, 2016 at 3:55 PM, Tom Lane wrote: > >> Andres Freund writes: > >>> If we go with rewriting this into LATERAL, I'd vote for 2.5 (trailed by > >>> option 1), that'd keep most of the functionality, and would break > >>> visibly rather than invisibly in the cases where not. > >> 2.5 would be okay with me. > > > Curious if this approach will also rewrite: > > select generate_series(1,generate_series(1,3)) s; > > ...into > > select s from generate_series(1,3) x cross join lateral > generate_series(1,x) s; > > Yeah, that would be the idea. > > Ok... It's only a single srf as far as the outer query is concerned so while it is odd the behavior is well defined and can be transformed while giving the same result. > > another interesting case today is: > > create sequence s; > > select generate_series(1,nextval('s')), generate_series(1,nextval('s')); > > > this statement never terminates. a lateral rewrite of this query > > would always terminate with much better defined and well understood > > behaviors -- this is good. > > Interesting example demonstrating that 100% bug compatibility is not > possible. But as you say, most people would probably prefer the other > behavior anyhow. > > If taking the 2.5 approach this one would fail as opposed to being rewritten. This could be an exception to the policy in #3 and would be ok in #2. It would fail in #1. Given the apparent general consensus for 2.5 and the lack of working field versions of this form the error seems like a no brainer. David J.
Re: [HACKERS] Changed SRF in targetlist handling
Merlin Moncure writes: > On Wed, May 25, 2016 at 3:55 PM, Tom Lane wrote: >> Andres Freund writes: >>> If we go with rewriting this into LATERAL, I'd vote for 2.5 (trailed by >>> option 1), that'd keep most of the functionality, and would break >>> visibly rather than invisibly in the cases where not. >> 2.5 would be okay with me. > Curious if this approach will also rewrite: > select generate_series(1,generate_series(1,3)) s; > ...into > select s from generate_series(1,3) x cross join lateral generate_series(1,x) > s; Yeah, that would be the idea. > another interesting case today is: > create sequence s; > select generate_series(1,nextval('s')), generate_series(1,nextval('s')); > this statement never terminates. a lateral rewrite of this query > would always terminate with much better defined and well understood > behaviors -- this is good. Interesting example demonstrating that 100% bug compatibility is not possible. But as you say, most people would probably prefer the other behavior anyhow. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changed SRF in targetlist handling
On Wed, May 25, 2016 at 3:55 PM, Tom Lane wrote: > Andres Freund writes: >> On 2016-05-25 15:20:03 -0400, Tom Lane wrote: >>> We could certainly make a variant behavior in nodeFunctionscan.c that >>> emulates that, if we feel that being exactly bug-compatible on the point >>> is actually what we want. I'm dubious about that though, not least >>> because I don't think *anyone* actually believes that that behavior isn't >>> broken. Did you read my upthread message suggesting assorted compromise >>> choices? > >> You mean >> https://www.postgresql.org/message-id/21076.1464034...@sss.pgh.pa.us ? >> If so, yes. > >> If we go with rewriting this into LATERAL, I'd vote for 2.5 (trailed by >> option 1), that'd keep most of the functionality, and would break >> visibly rather than invisibly in the cases where not. > > 2.5 would be okay with me. > >> I guess you're not planning to work on this? > > Well, not right now, as it's clearly too late for 9.6. I might hack on > it later if nobody beats me to it. Curious if this approach will also rewrite: select generate_series(1,generate_series(1,3)) s; ...into select s from generate_series(1,3) x cross join lateral generate_series(1,x) s; another interesting case today is: create sequence s; select generate_series(1,nextval('s')), generate_series(1,nextval('s')); this statement never terminates. a lateral rewrite of this query would always terminate with much better defined and well understood behaviors -- this is good. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changed SRF in targetlist handling
Andres Freund writes: > On 2016-05-25 15:20:03 -0400, Tom Lane wrote: >> We could certainly make a variant behavior in nodeFunctionscan.c that >> emulates that, if we feel that being exactly bug-compatible on the point >> is actually what we want. I'm dubious about that though, not least >> because I don't think *anyone* actually believes that that behavior isn't >> broken. Did you read my upthread message suggesting assorted compromise >> choices? > You mean https://www.postgresql.org/message-id/21076.1464034...@sss.pgh.pa.us > ? > If so, yes. > If we go with rewriting this into LATERAL, I'd vote for 2.5 (trailed by > option 1), that'd keep most of the functionality, and would break > visibly rather than invisibly in the cases where not. 2.5 would be okay with me. > I guess you're not planning to work on this? Well, not right now, as it's clearly too late for 9.6. I might hack on it later if nobody beats me to it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changed SRF in targetlist handling
On 2016-05-25 15:20:03 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2016-05-25 15:02:23 -0400, Tom Lane wrote: > >> [ shrug... ] That seems like it's morally equivalent to (but uglier than) > >> what I wanted to do, which is to teach the planner to rewrite the query to > >> put the SRFs into a lateral FROM item. Splitting the tlist into two > >> levels will work out to be exactly the same rewriting problem. > > > I think that depends on how bug compatible we want to be. It seems > > harder to get the (rather odd!) lockstep iteration behaviour between two > > SRFS with the LATERAL approach? > > We could certainly make a variant behavior in nodeFunctionscan.c that > emulates that, if we feel that being exactly bug-compatible on the point > is actually what we want. I'm dubious about that though, not least > because I don't think *anyone* actually believes that that behavior isn't > broken. Did you read my upthread message suggesting assorted compromise > choices? You mean https://www.postgresql.org/message-id/21076.1464034...@sss.pgh.pa.us ? If so, yes. If we go with rewriting this into LATERAL, I'd vote for 2.5 (trailed by option 1), that'd keep most of the functionality, and would break visibly rather than invisibly in the cases where not. I guess you're not planning to work on this? Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changed SRF in targetlist handling
Andres Freund writes: > On 2016-05-25 15:02:23 -0400, Tom Lane wrote: >> [ shrug... ] That seems like it's morally equivalent to (but uglier than) >> what I wanted to do, which is to teach the planner to rewrite the query to >> put the SRFs into a lateral FROM item. Splitting the tlist into two >> levels will work out to be exactly the same rewriting problem. > I think that depends on how bug compatible we want to be. It seems > harder to get the (rather odd!) lockstep iteration behaviour between two > SRFS with the LATERAL approach? We could certainly make a variant behavior in nodeFunctionscan.c that emulates that, if we feel that being exactly bug-compatible on the point is actually what we want. I'm dubious about that though, not least because I don't think *anyone* actually believes that that behavior isn't broken. Did you read my upthread message suggesting assorted compromise choices? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changed SRF in targetlist handling
On 2016-05-25 15:02:23 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2016-05-23 13:10:29 -0400, Tom Lane wrote: > >> Would that not lead to, in effect, duplicating all of execQual.c? The new > >> executor node would still have to be prepared to process all expression > >> node types. > > > I don't think it necessarily has to. ISTM that if we add a version of > > ExecProject()/ExecTargetList() that continues returning multiple rows, > > we can make the knowledge about the one type of expression we allow to > > return multiple rows. That'd require a bit of uglyness to implement > > stuff like > > SELECT generate_series(1, 2)::text, generate_series(1, 2) * 5; > > etc. It seems we'd basically have to do one projection step for the > > SRFs, and then another for the rest. I'm inclined to think that's > > acceptable to get rid of a lot of the related uglyness. > > [ shrug... ] That seems like it's morally equivalent to (but uglier than) > what I wanted to do, which is to teach the planner to rewrite the query to > put the SRFs into a lateral FROM item. Splitting the tlist into two > levels will work out to be exactly the same rewriting problem. I think that depends on how bug compatible we want to be. It seems harder to get the (rather odd!) lockstep iteration behaviour between two SRFS with the LATERAL approach? tpch[6098][1]=# SELECT generate_series(1, 3), generate_series(1,3); ┌─┬─┐ │ generate_series │ generate_series │ ├─┼─┤ │ 1 │ 1 │ │ 2 │ 2 │ │ 3 │ 3 │ └─┴─┘ (3 rows) tpch[6098][1]=# SELECT generate_series(1, 3), generate_series(1,4); ┌─┬─┐ │ generate_series │ generate_series │ ├─┼─┤ │ 1 │ 1 │ │ 2 │ 2 │ │ 3 │ 3 │ │ 1 │ 4 │ │ 2 │ 1 │ │ 3 │ 2 │ │ 1 │ 3 │ │ 2 │ 4 │ │ 3 │ 1 │ │ 1 │ 2 │ │ 2 │ 3 │ │ 3 │ 4 │ └─┴─┘ (12 rows) Regards, Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changed SRF in targetlist handling
Andres Freund writes: > On 2016-05-23 13:10:29 -0400, Tom Lane wrote: >> Would that not lead to, in effect, duplicating all of execQual.c? The new >> executor node would still have to be prepared to process all expression >> node types. > I don't think it necessarily has to. ISTM that if we add a version of > ExecProject()/ExecTargetList() that continues returning multiple rows, > we can make the knowledge about the one type of expression we allow to > return multiple rows. That'd require a bit of uglyness to implement > stuff like > SELECT generate_series(1, 2)::text, generate_series(1, 2) * 5; > etc. It seems we'd basically have to do one projection step for the > SRFs, and then another for the rest. I'm inclined to think that's > acceptable to get rid of a lot of the related uglyness. [ shrug... ] That seems like it's morally equivalent to (but uglier than) what I wanted to do, which is to teach the planner to rewrite the query to put the SRFs into a lateral FROM item. Splitting the tlist into two levels will work out to be exactly the same rewriting problem. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changed SRF in targetlist handling
On 2016-05-23 13:10:29 -0400, Tom Lane wrote: > Andres Freund writes: > > One idea I circulated was to fix that by interjecting a special executor > > node to process SRF containing targetlists (reusing Result possibly?). > > That'd allow to remove the isDone argument from ExecEval*/ExecProject* > > and get rid of ps_TupFromTlist which is fairly ugly. > > Would that not lead to, in effect, duplicating all of execQual.c? The new > executor node would still have to be prepared to process all expression > node types. I don't think it necessarily has to. ISTM that if we add a version of ExecProject()/ExecTargetList() that continues returning multiple rows, we can make the knowledge about the one type of expression we allow to return multiple rows. That'd require a bit of uglyness to implement stuff like SELECT generate_series(1, 2)::text, generate_series(1, 2) * 5; etc. It seems we'd basically have to do one projection step for the SRFs, and then another for the rest. I'm inclined to think that's acceptable to get rid of a lot of the related uglyness. > > One issue with removing targetlist SRFs is that they're currently > > considerably faster than SRFs in FROM: > > I suspect that depends greatly on your test case. But in any case > we could put more effort into optimizing nodeFunctionscan. I doubt you'll find cases where it's significantly the other way round for percall SRFs. The fundamental issue is that targetlist SRFs don't have to spill to a tuplestore, whereas nodeFunctionscan ones have to (even if they're percall). -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changed SRF in targetlist handling
On 05/23/2016 02:37 PM, David G. Johnston wrote: > But then I don't get Joe's point - if its an implementation detail why > should it matter if rewriting the SRF-in-tlist to be laterals changes > execution from a serial to an interleaved implementation. Plus, Joe's > claim: "the capability to pipeline results is still only available in > the target list", and yours above are at odds since you claim the > rewritten behavior is the same today. Is there a disconnect in > knowledge or are you talking about different things? Unless there have been recent changes which I missed, ValuePerCall SRFs are still run to completion in one go, when executed in the FROM clause, but they project one-row-at-a-time in the target list. If your SRF returns many-many rows, the problem with the former case is that the entire thing has to be materialized in memory. Joe -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development signature.asc Description: OpenPGP digital signature
Re: [HACKERS] Changed SRF in targetlist handling
On Mon, May 23, 2016 at 4:42 PM, Tom Lane wrote: > "David G. Johnston" writes: > > On Mon, May 23, 2016 at 4:24 PM, Alvaro Herrera < > alvhe...@2ndquadrant.com> > > wrote: > >> Ah, so that's what "pipeline results" mean! I hadn't gotten that. I > >> agree; Abhijit had a patch or a plan for this, a long time ago ... > > > Is this sidebar strictly an implementation detail, not user visible? > > Hmm. It could be visible in the sense that the execution of multiple > functions in one ROWS FROM() construct could be interleaved, while > (I think) the current implementation runs each one to completion > serially. But if you're writing code that assumes that, I think you > should not be very surprised when we break it. In any case, that > would not affect the proposed translation for SRFs-in-tlist, since > those have that behavior today. > Thanks Sounds like "zipper results" would be a better term for it...but, yes, if that's the general context it falls into implementation from my perspective. But then I don't get Joe's point - if its an implementation detail why should it matter if rewriting the SRF-in-tlist to be laterals changes execution from a serial to an interleaved implementation. Plus, Joe's claim: "the capability to pipeline results is still only available in the target list", and yours above are at odds since you claim the rewritten behavior is the same today. Is there a disconnect in knowledge or are you talking about different things? David J.
Re: [HACKERS] Changed SRF in targetlist handling
"David G. Johnston" writes: > On Mon, May 23, 2016 at 4:24 PM, Alvaro Herrera > wrote: >> Ah, so that's what "pipeline results" mean! I hadn't gotten that. I >> agree; Abhijit had a patch or a plan for this, a long time ago ... > âIs this sidebar strictly an implementation detail, not user visible? Hmm. It could be visible in the sense that the execution of multiple functions in one ROWS FROM() construct could be interleaved, while (I think) the current implementation runs each one to completion serially. But if you're writing code that assumes that, I think you should not be very surprised when we break it. In any case, that would not affect the proposed translation for SRFs-in-tlist, since those have that behavior today. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changed SRF in targetlist handling
On Mon, May 23, 2016 at 4:24 PM, Alvaro Herrera wrote: > Tom Lane wrote: > > Joe Conway writes: > > > > I'll also note that, unless I missed something, we also have to > consider > > > that the capability to pipeline results is still only available in the > > > target list. > > > > Yes, we would definitely want to improve nodeFunctionscan.c to perform > > better for ValuePerCall SRFs. But that has value independently of this. > > Ah, so that's what "pipeline results" mean! I hadn't gotten that. I > agree; Abhijit had a patch or a plan for this, a long time ago ... > > Is this sidebar strictly an implementation detail, not user visible? David J.
Re: [HACKERS] Changed SRF in targetlist handling
On Mon, May 23, 2016 at 4:15 PM, Tom Lane wrote: > Merlin Moncure writes: > > On Mon, May 23, 2016 at 2:13 PM, David Fetter wrote: > >> On Mon, May 23, 2016 at 01:28:11PM -0500, Merlin Moncure wrote: > >>> +1 on removing LCM. > > >> As a green field project, that would make total sense. As a thing > >> decades in, it's not clear to me that that would break less stuff or > >> break it worse than simply disallowing SRFs in the target list, which > >> has been rejected on bugward-compatibility grounds. I suspect it > >> would be even worse because disallowing SRFs in target lists would at > >> least be obvious and localized when it broke code. > > > If I'm reading this correctly, it sounds to me like you are making the > > case that removing target list SRF completely would somehow cause less > > breakage than say, rewriting it to a LATERAL based implementation for > > example. With more than a little forbearance, let's just say I don't > > agree. > > We should consider single and multiple SRFs in a targetlist as distinct > use-cases; only the latter has got weird properties. > > There are several things we could potentially do with multiple SRFs in > the same targetlist. In increasing order of backwards compatibility and > effort required: > > 1. Throw error if there's more than one SRF. > > 2. Rewrite into LATERAL ROWS FROM (srf1(), srf2(), ...). This would > have the same behavior as before if the SRFs all return the same number > of rows, and otherwise would behave differently. > > 3. Rewrite into some other construct that still ends up as a FunctionScan > RTE node, but has the old LCM behavior if the SRFs produce different > numbers of rows. (Perhaps we would not need to expose this construct > as something directly SQL-visible.) > > It's certainly arguable that the common use-cases for SRF-in-tlist > don't have more than one SRF per tlist, and thus that implementing #1 > would be an appropriate amount of effort. It's worth noting also that > the LCM behavior has been repeatedly reported as a bug, and therefore > that if we do #3 we'll be expending very substantial effort to be > literally bug-compatible with ancient behavior that no one in the > current development group thinks is well-designed. As far as #2 goes, > it would have the advantage that code depending on the same-number-of- > rows case would continue to work as before. David has a point that it > would silently break application code that's actually depending on the > LCM behavior, but how much of that is there likely to be, really? > > [ reflects a bit... ] I guess there is room for an option 2-and-a-half: > > 2.5. Rewrite into LATERAL ROWS FROM (srf1(), srf2(), ...), but decorate > the FunctionScan RTE to tell the executor to throw an error if the SRFs > don't all return the same number of rows, rather than silently > null-padding. This would have the same behavior as before for the sane > case, and would be very not-silent about cases where apps actually invoked > the LCM behavior. Again, we wouldn't necessarily have to expose such an > option at the SQL level. (Though it strikes me that such a restriction > could have value in its own right, analogous to the STRICT options that > we've invented in some other places to allow insisting on the expected > numbers of rows being returned. ROWS FROM STRICT (srf1(), srf2(), ...), > anybody?) > I'd let the engineers decide between 1, 2.5, and 3 - but if we accomplish our goals while implementing #3 I'd say that would be the best outcome for the community as whole. We don't have the luxury of providing a safe-usage mode where people writing new queries get the error but pre-existing queries are considered OK. We will have to rely upon education and deal with the occasional bug report but our long-time customers, even if only a minority would be affected, will appreciate the effort taken to not break code that has been working for a long time. The minority is likely small enough to at least make options 1 and 2.5 viable though I'd think we make an effort to avoid #1. David J.
Re: [HACKERS] Changed SRF in targetlist handling
Tom Lane wrote: > Joe Conway writes: > > I'll also note that, unless I missed something, we also have to consider > > that the capability to pipeline results is still only available in the > > target list. > > Yes, we would definitely want to improve nodeFunctionscan.c to perform > better for ValuePerCall SRFs. But that has value independently of this. Ah, so that's what "pipeline results" mean! I hadn't gotten that. I agree; Abhijit had a patch or a plan for this, a long time ago ... -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changed SRF in targetlist handling
Joe Conway writes: > I would be in favor of rewriting it to a LATERAL, but that would not be > backwards compatible entirely either IIUC. It could be made so, I think, but it may be more trouble than it's worth; see my previous message. > I'll also note that, unless I missed something, we also have to consider > that the capability to pipeline results is still only available in the > target list. Yes, we would definitely want to improve nodeFunctionscan.c to perform better for ValuePerCall SRFs. But that has value independently of this. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changed SRF in targetlist handling
On Mon, May 23, 2016 at 4:05 PM, David Fetter wrote: > On Mon, May 23, 2016 at 02:39:54PM -0500, Merlin Moncure wrote: > > On Mon, May 23, 2016 at 2:13 PM, David Fetter wrote: > > > On Mon, May 23, 2016 at 01:28:11PM -0500, Merlin Moncure wrote: > > >> On Mon, May 23, 2016 at 12:10 PM, Tom Lane wrote: > > >> > Andres Freund writes: > > >> >> discussing executor performance with a number of people at pgcon, > > >> >> several hackers - me included - complained about the additional > > >> >> complexity, both code and runtime, required to handle SRFs in the > target > > >> >> list. > > >> > > > >> > Yeah, this has been an annoyance for a long time. > > >> > > > >> >> One idea I circulated was to fix that by interjecting a special > executor > > >> >> node to process SRF containing targetlists (reusing Result > possibly?). > > >> >> That'd allow to remove the isDone argument from > ExecEval*/ExecProject* > > >> >> and get rid of ps_TupFromTlist which is fairly ugly. > > >> > > > >> > Would that not lead to, in effect, duplicating all of execQual.c? > The new > > >> > executor node would still have to be prepared to process all > expression > > >> > node types. > > >> > > > >> >> Robert suggested - IIRC mentioning previous on-list discussion - to > > >> >> instead rewrite targetlist SRFs into lateral joins. My gut feeling > is > > >> >> that that'd be a larger undertaking, with significant semantics > changes. > > >> > > > >> > Yes, this was discussed on-list awhile back (I see David found a > reference > > >> > already). I think it's feasible, although we'd first have to agree > > >> > whether we want to remain bug-compatible with the old > > >> > least-common-multiple-of-the-periods behavior. I would vote for > not, > > >> > but it's certainly a debatable thing. > > >> > > >> +1 on removing LCM. > > > > > > As a green field project, that would make total sense. As a thing > > > decades in, it's not clear to me that that would break less stuff or > > > break it worse than simply disallowing SRFs in the target list, which > > > has been rejected on bugward-compatibility grounds. I suspect it > > > would be even worse because disallowing SRFs in target lists would at > > > least be obvious and localized when it broke code. > > > > If I'm reading this correctly, it sounds to me like you are making the > > case that removing target list SRF completely would somehow cause less > > breakage than say, rewriting it to a LATERAL based implementation for > > example. > > Yes. > > Making SRFs in target lists throw an error is a thing that will be > pretty straightforward to deal with in extant code bases, whatever > size of pain in the neck it might be. The line of code that caused > the error would be very clear, and the fix would be very obvious. > > Making their behavior different in some way that throws no warnings is > guaranteed to cause subtle and hard to track bugs in extant code > bases. I'm advocating that if a presently allowed SRF-in-target-list is allowed to remain it executes using the same semantics it has today. In all other cases, including LCM, if the present behavior is undesirable to maintain we throw an error. I'd hope that such an error can be written in such a way as to name the offending function or functions. If the user of a complex query doesn't want to expend the effort to locate the specific instance of SRF that is in violation they will still have the option to rewrite all of their uses in that particular query. David J.
Re: [HACKERS] Changed SRF in targetlist handling
Merlin Moncure writes: > On Mon, May 23, 2016 at 2:13 PM, David Fetter wrote: >> On Mon, May 23, 2016 at 01:28:11PM -0500, Merlin Moncure wrote: >>> +1 on removing LCM. >> As a green field project, that would make total sense. As a thing >> decades in, it's not clear to me that that would break less stuff or >> break it worse than simply disallowing SRFs in the target list, which >> has been rejected on bugward-compatibility grounds. I suspect it >> would be even worse because disallowing SRFs in target lists would at >> least be obvious and localized when it broke code. > If I'm reading this correctly, it sounds to me like you are making the > case that removing target list SRF completely would somehow cause less > breakage than say, rewriting it to a LATERAL based implementation for > example. With more than a little forbearance, let's just say I don't > agree. We should consider single and multiple SRFs in a targetlist as distinct use-cases; only the latter has got weird properties. There are several things we could potentially do with multiple SRFs in the same targetlist. In increasing order of backwards compatibility and effort required: 1. Throw error if there's more than one SRF. 2. Rewrite into LATERAL ROWS FROM (srf1(), srf2(), ...). This would have the same behavior as before if the SRFs all return the same number of rows, and otherwise would behave differently. 3. Rewrite into some other construct that still ends up as a FunctionScan RTE node, but has the old LCM behavior if the SRFs produce different numbers of rows. (Perhaps we would not need to expose this construct as something directly SQL-visible.) It's certainly arguable that the common use-cases for SRF-in-tlist don't have more than one SRF per tlist, and thus that implementing #1 would be an appropriate amount of effort. It's worth noting also that the LCM behavior has been repeatedly reported as a bug, and therefore that if we do #3 we'll be expending very substantial effort to be literally bug-compatible with ancient behavior that no one in the current development group thinks is well-designed. As far as #2 goes, it would have the advantage that code depending on the same-number-of- rows case would continue to work as before. David has a point that it would silently break application code that's actually depending on the LCM behavior, but how much of that is there likely to be, really? [ reflects a bit... ] I guess there is room for an option 2-and-a-half: 2.5. Rewrite into LATERAL ROWS FROM (srf1(), srf2(), ...), but decorate the FunctionScan RTE to tell the executor to throw an error if the SRFs don't all return the same number of rows, rather than silently null-padding. This would have the same behavior as before for the sane case, and would be very not-silent about cases where apps actually invoked the LCM behavior. Again, we wouldn't necessarily have to expose such an option at the SQL level. (Though it strikes me that such a restriction could have value in its own right, analogous to the STRICT options that we've invented in some other places to allow insisting on the expected numbers of rows being returned. ROWS FROM STRICT (srf1(), srf2(), ...), anybody?) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changed SRF in targetlist handling
On Mon, May 23, 2016 at 02:39:54PM -0500, Merlin Moncure wrote: > On Mon, May 23, 2016 at 2:13 PM, David Fetter wrote: > > On Mon, May 23, 2016 at 01:28:11PM -0500, Merlin Moncure wrote: > >> On Mon, May 23, 2016 at 12:10 PM, Tom Lane wrote: > >> > Andres Freund writes: > >> >> discussing executor performance with a number of people at pgcon, > >> >> several hackers - me included - complained about the additional > >> >> complexity, both code and runtime, required to handle SRFs in the target > >> >> list. > >> > > >> > Yeah, this has been an annoyance for a long time. > >> > > >> >> One idea I circulated was to fix that by interjecting a special executor > >> >> node to process SRF containing targetlists (reusing Result possibly?). > >> >> That'd allow to remove the isDone argument from ExecEval*/ExecProject* > >> >> and get rid of ps_TupFromTlist which is fairly ugly. > >> > > >> > Would that not lead to, in effect, duplicating all of execQual.c? The > >> > new > >> > executor node would still have to be prepared to process all expression > >> > node types. > >> > > >> >> Robert suggested - IIRC mentioning previous on-list discussion - to > >> >> instead rewrite targetlist SRFs into lateral joins. My gut feeling is > >> >> that that'd be a larger undertaking, with significant semantics changes. > >> > > >> > Yes, this was discussed on-list awhile back (I see David found a > >> > reference > >> > already). I think it's feasible, although we'd first have to agree > >> > whether we want to remain bug-compatible with the old > >> > least-common-multiple-of-the-periods behavior. I would vote for not, > >> > but it's certainly a debatable thing. > >> > >> +1 on removing LCM. > > > > As a green field project, that would make total sense. As a thing > > decades in, it's not clear to me that that would break less stuff or > > break it worse than simply disallowing SRFs in the target list, which > > has been rejected on bugward-compatibility grounds. I suspect it > > would be even worse because disallowing SRFs in target lists would at > > least be obvious and localized when it broke code. > > If I'm reading this correctly, it sounds to me like you are making the > case that removing target list SRF completely would somehow cause less > breakage than say, rewriting it to a LATERAL based implementation for > example. Yes. Making SRFs in target lists throw an error is a thing that will be pretty straightforward to deal with in extant code bases, whatever size of pain in the neck it might be. The line of code that caused the error would be very clear, and the fix would be very obvious. Making their behavior different in some way that throws no warnings is guaranteed to cause subtle and hard to track bugs in extant code bases. We lost not a few existing users when we caused similar knock-ons in 8.3 by removing automated casts to text. I am no longer advocating for removing the functionality. I am just pointing out that the knock-on effects of changing the functionality may well cause more pain than the ones from removing it entirely. > With more than a little forbearance, let's just say I don't agree. If you'd be so kind as to explain your reasons, I think we'd all benefit. Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changed SRF in targetlist handling
On 05/23/2016 12:39 PM, Merlin Moncure wrote: > On Mon, May 23, 2016 at 2:13 PM, David Fetter wrote: >> On Mon, May 23, 2016 at 01:28:11PM -0500, Merlin Moncure wrote: >>> On Mon, May 23, 2016 at 12:10 PM, Tom Lane wrote: Andres Freund writes: > discussing executor performance with a number of people at pgcon, > several hackers - me included - complained about the additional > complexity, both code and runtime, required to handle SRFs in the target > list. Yeah, this has been an annoyance for a long time. > One idea I circulated was to fix that by interjecting a special executor > node to process SRF containing targetlists (reusing Result possibly?). > That'd allow to remove the isDone argument from ExecEval*/ExecProject* > and get rid of ps_TupFromTlist which is fairly ugly. Would that not lead to, in effect, duplicating all of execQual.c? The new executor node would still have to be prepared to process all expression node types. > Robert suggested - IIRC mentioning previous on-list discussion - to > instead rewrite targetlist SRFs into lateral joins. My gut feeling is > that that'd be a larger undertaking, with significant semantics changes. Yes, this was discussed on-list awhile back (I see David found a reference already). I think it's feasible, although we'd first have to agree whether we want to remain bug-compatible with the old least-common-multiple-of-the-periods behavior. I would vote for not, but it's certainly a debatable thing. >>> >>> +1 on removing LCM. >> >> As a green field project, that would make total sense. As a thing >> decades in, it's not clear to me that that would break less stuff or >> break it worse than simply disallowing SRFs in the target list, which >> has been rejected on bugward-compatibility grounds. I suspect it >> would be even worse because disallowing SRFs in target lists would at >> least be obvious and localized when it broke code. > > If I'm reading this correctly, it sounds to me like you are making the > case that removing target list SRF completely would somehow cause less > breakage than say, rewriting it to a LATERAL based implementation for > example. With more than a little forbearance, let's just say I don't > agree. I'm not necessarily saying that we should totally remove target list SRFs, but I will point out it has been deprecated ever since SRFs were first introduced: http://www.postgresql.org/docs/7.3/static/xfunc-sql.html "Currently, functions returning sets may also be called in the target list of a SELECT query. For each row that the SELECT generates by itself, the function returning set is invoked, and an output row is generated for each element of the function's result set. Note, however, that this capability is deprecated and may be removed in future releases." I would be in favor of rewriting it to a LATERAL, but that would not be backwards compatible entirely either IIUC. I'll also note that, unless I missed something, we also have to consider that the capability to pipeline results is still only available in the target list. Joe -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development signature.asc Description: OpenPGP digital signature
Re: [HACKERS] Changed SRF in targetlist handling
On Mon, May 23, 2016 at 2:13 PM, David Fetter wrote: > On Mon, May 23, 2016 at 01:28:11PM -0500, Merlin Moncure wrote: >> On Mon, May 23, 2016 at 12:10 PM, Tom Lane wrote: >> > Andres Freund writes: >> >> discussing executor performance with a number of people at pgcon, >> >> several hackers - me included - complained about the additional >> >> complexity, both code and runtime, required to handle SRFs in the target >> >> list. >> > >> > Yeah, this has been an annoyance for a long time. >> > >> >> One idea I circulated was to fix that by interjecting a special executor >> >> node to process SRF containing targetlists (reusing Result possibly?). >> >> That'd allow to remove the isDone argument from ExecEval*/ExecProject* >> >> and get rid of ps_TupFromTlist which is fairly ugly. >> > >> > Would that not lead to, in effect, duplicating all of execQual.c? The new >> > executor node would still have to be prepared to process all expression >> > node types. >> > >> >> Robert suggested - IIRC mentioning previous on-list discussion - to >> >> instead rewrite targetlist SRFs into lateral joins. My gut feeling is >> >> that that'd be a larger undertaking, with significant semantics changes. >> > >> > Yes, this was discussed on-list awhile back (I see David found a reference >> > already). I think it's feasible, although we'd first have to agree >> > whether we want to remain bug-compatible with the old >> > least-common-multiple-of-the-periods behavior. I would vote for not, >> > but it's certainly a debatable thing. >> >> +1 on removing LCM. > > As a green field project, that would make total sense. As a thing > decades in, it's not clear to me that that would break less stuff or > break it worse than simply disallowing SRFs in the target list, which > has been rejected on bugward-compatibility grounds. I suspect it > would be even worse because disallowing SRFs in target lists would at > least be obvious and localized when it broke code. If I'm reading this correctly, it sounds to me like you are making the case that removing target list SRF completely would somehow cause less breakage than say, rewriting it to a LATERAL based implementation for example. With more than a little forbearance, let's just say I don't agree. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changed SRF in targetlist handling
On Mon, May 23, 2016 at 01:28:11PM -0500, Merlin Moncure wrote: > On Mon, May 23, 2016 at 12:10 PM, Tom Lane wrote: > > Andres Freund writes: > >> discussing executor performance with a number of people at pgcon, > >> several hackers - me included - complained about the additional > >> complexity, both code and runtime, required to handle SRFs in the target > >> list. > > > > Yeah, this has been an annoyance for a long time. > > > >> One idea I circulated was to fix that by interjecting a special executor > >> node to process SRF containing targetlists (reusing Result possibly?). > >> That'd allow to remove the isDone argument from ExecEval*/ExecProject* > >> and get rid of ps_TupFromTlist which is fairly ugly. > > > > Would that not lead to, in effect, duplicating all of execQual.c? The new > > executor node would still have to be prepared to process all expression > > node types. > > > >> Robert suggested - IIRC mentioning previous on-list discussion - to > >> instead rewrite targetlist SRFs into lateral joins. My gut feeling is > >> that that'd be a larger undertaking, with significant semantics changes. > > > > Yes, this was discussed on-list awhile back (I see David found a reference > > already). I think it's feasible, although we'd first have to agree > > whether we want to remain bug-compatible with the old > > least-common-multiple-of-the-periods behavior. I would vote for not, > > but it's certainly a debatable thing. > > +1 on removing LCM. As a green field project, that would make total sense. As a thing decades in, it's not clear to me that that would break less stuff or break it worse than simply disallowing SRFs in the target list, which has been rejected on bugward-compatibility grounds. I suspect it would be even worse because disallowing SRFs in target lists would at least be obvious and localized when it broke code. Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changed SRF in targetlist handling
On Mon, May 23, 2016 at 1:44 PM, David Fetter wrote: > On Mon, May 23, 2016 at 01:36:57PM -0400, Tom Lane wrote: > > David Fetter writes: > > > On Mon, May 23, 2016 at 01:10:29PM -0400, Tom Lane wrote: > > >> This seems a bridge too far to me. It's just way too common to do > > >> "select generate_series(1,n)". We could tell people they have to > > >> rewrite to "select * from generate_series(1,n)", but it would be far > > >> more polite to do that for them. > > > > > How about making "TABLE generate_series(1,n)" work? It's even > > > shorter in exchange for some cognitive load. > > > > No thanks --- the word after TABLE ought to be a table name, not some > > arbitrary expression. That's way too much mess to save one keystroke. > > It's not just about saving a keystroke. This change would go with > removing the ability to do SRFs in the target list of a SELECT > query. > If you want to make an argument for doing this regardless of the target list SRF change by all means - but it does absolutely nothing to mitigate the breakage that would result if we choose this path. David J.
Re: [HACKERS] Changed SRF in targetlist handling
On Mon, May 23, 2016 at 12:10 PM, Tom Lane wrote: > Andres Freund writes: >> discussing executor performance with a number of people at pgcon, >> several hackers - me included - complained about the additional >> complexity, both code and runtime, required to handle SRFs in the target >> list. > > Yeah, this has been an annoyance for a long time. > >> One idea I circulated was to fix that by interjecting a special executor >> node to process SRF containing targetlists (reusing Result possibly?). >> That'd allow to remove the isDone argument from ExecEval*/ExecProject* >> and get rid of ps_TupFromTlist which is fairly ugly. > > Would that not lead to, in effect, duplicating all of execQual.c? The new > executor node would still have to be prepared to process all expression > node types. > >> Robert suggested - IIRC mentioning previous on-list discussion - to >> instead rewrite targetlist SRFs into lateral joins. My gut feeling is >> that that'd be a larger undertaking, with significant semantics changes. > > Yes, this was discussed on-list awhile back (I see David found a reference > already). I think it's feasible, although we'd first have to agree > whether we want to remain bug-compatible with the old > least-common-multiple-of-the-periods behavior. I would vote for not, > but it's certainly a debatable thing. +1 on removing LCM. The behavior of multiple targetlist SRF is so bizarre that it's incredible to believe anyone would reasonably expect it to work that way. Agree also that casual sane usage of target list SRF is incredibly common via generate_series() and unnest() etc is exceptionally common...better not to break those cases without a better justification than code simplicity. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changed SRF in targetlist handling
David Fetter writes: > On Mon, May 23, 2016 at 01:36:57PM -0400, Tom Lane wrote: >> David Fetter writes: >>> How about making "TABLE generate_series(1,n)" work? It's even >>> shorter in exchange for some cognitive load. >> No thanks --- the word after TABLE ought to be a table name, not some >> arbitrary expression. That's way too much mess to save one keystroke. > It's not just about saving a keystroke. This change would go with > removing the ability to do SRFs in the target list of a SELECT > query. I guess you did not understand that I was rejecting doing that. Telling people they have to modify existing code that does this and works fine is exactly what I felt we can't do. We might be able to blow off complicated cases, but I think simpler cases are too common in the field. I'm on board with fixing things so that the *implementation* doesn't support SRF-in-tlist. But we can't just remove it from the language. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changed SRF in targetlist handling
On Mon, May 23, 2016 at 01:36:57PM -0400, Tom Lane wrote: > David Fetter writes: > > On Mon, May 23, 2016 at 01:10:29PM -0400, Tom Lane wrote: > >> This seems a bridge too far to me. It's just way too common to do > >> "select generate_series(1,n)". We could tell people they have to > >> rewrite to "select * from generate_series(1,n)", but it would be far > >> more polite to do that for them. > > > How about making "TABLE generate_series(1,n)" work? It's even > > shorter in exchange for some cognitive load. > > No thanks --- the word after TABLE ought to be a table name, not some > arbitrary expression. That's way too much mess to save one keystroke. It's not just about saving a keystroke. This change would go with removing the ability to do SRFs in the target list of a SELECT query. Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changed SRF in targetlist handling
David Fetter writes: > On Mon, May 23, 2016 at 01:10:29PM -0400, Tom Lane wrote: >> This seems a bridge too far to me. It's just way too common to do >> "select generate_series(1,n)". We could tell people they have to >> rewrite to "select * from generate_series(1,n)", but it would be far >> more polite to do that for them. > How about making "TABLE generate_series(1,n)" work? It's even > shorter in exchange for some cognitive load. No thanks --- the word after TABLE ought to be a table name, not some arbitrary expression. That's way too much mess to save one keystroke. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changed SRF in targetlist handling
On Mon, May 23, 2016 at 01:10:29PM -0400, Tom Lane wrote: > This seems a bridge too far to me. It's just way too common to do > "select generate_series(1,n)". We could tell people they have to > rewrite to "select * from generate_series(1,n)", but it would be far > more polite to do that for them. How about making "TABLE generate_series(1,n)" work? It's even shorter in exchange for some cognitive load. Cheers, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changed SRF in targetlist handling
Andres Freund writes: > discussing executor performance with a number of people at pgcon, > several hackers - me included - complained about the additional > complexity, both code and runtime, required to handle SRFs in the target > list. Yeah, this has been an annoyance for a long time. > One idea I circulated was to fix that by interjecting a special executor > node to process SRF containing targetlists (reusing Result possibly?). > That'd allow to remove the isDone argument from ExecEval*/ExecProject* > and get rid of ps_TupFromTlist which is fairly ugly. Would that not lead to, in effect, duplicating all of execQual.c? The new executor node would still have to be prepared to process all expression node types. > Robert suggested - IIRC mentioning previous on-list discussion - to > instead rewrite targetlist SRFs into lateral joins. My gut feeling is > that that'd be a larger undertaking, with significant semantics changes. Yes, this was discussed on-list awhile back (I see David found a reference already). I think it's feasible, although we'd first have to agree whether we want to remain bug-compatible with the old least-common-multiple-of-the-periods behavior. I would vote for not, but it's certainly a debatable thing. > If we accept bigger semantical changes, I'm inclined to instead just get > rid of targetlist SRFs in total; they're really weird and not needed > anymore. This seems a bridge too far to me. It's just way too common to do "select generate_series(1,n)". We could tell people they have to rewrite to "select * from generate_series(1,n)", but it would be far more polite to do that for them. > One issue with removing targetlist SRFs is that they're currently > considerably faster than SRFs in FROM: I suspect that depends greatly on your test case. But in any case we could put more effort into optimizing nodeFunctionscan. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Changed SRF in targetlist handling
tl;dr Semantic changes to SRF-in-target-list processing are undesirable when they are all but deprecated. I'd accept a refactoring that trades a performance gain for unaffected queries for a reasonable performance hit of those afflicted. Preamble... Most recent thread that I can recall seeing on the topic - and where I believe the rewrite idea was first presented. http://www.postgresql.org/message-id/flat/25750.1458767...@sss.pgh.pa.us#25750.1458767...@sss.pgh.pa.us On Sun, May 22, 2016 at 8:53 PM, Andres Freund wrote: > Hi, > > discussing executor performance with a number of people at pgcon, > several hackers - me included - complained about the additional > complexity, both code and runtime, required to handle SRFs in the target > list. > > One idea I circulated was to fix that by interjecting a special executor > node to process SRF containing targetlists (reusing Result possibly?). > That'd allow to remove the isDone argument from ExecEval*/ExecProject* > and get rid of ps_TupFromTlist which is fairly ugly. > Conceptually I'm all for minimizing the impact on queries of this form. It seems to be the most likely to get written and committed and the least likely to cause unforeseen issues. > Robert suggested - IIRC mentioning previous on-list discussion - to > instead rewrite targetlist SRFs into lateral joins. My gut feeling is > that that'd be a larger undertaking, with significant semantics changes. > [...] > If we accept bigger semantical changes, I'm inclined to instead just get > rid of targetlist SRFs in total; they're really weird and not needed > anymore. > I cannot see these, in isolation, being a good option. Nonetheless, I don't think any semantic change should happen before 9.2 becomes no longer supported. I'd be inclined to take a similar approach as with standard_conforming_strings (minus the execution guc, just the warning one) with whatever after-the-fact learning taken into account. Its worth considering query rewrite and making it forbidden as a joint goal. For something like a canonical version of this, especially for composite-returning SRF: WITH func_call ( SELECT func(tbl.col) FROM tbl ) SELECT (func_call.func).* FROM func_call; If we can rewrite the CTE portion into a lateral - with the exact same semantics (specifically, returning the single-column composite) then check the rewritten query the select list SRF would not longer be present and no error would be thrown. For situations where a rewrite cannot be made to behave properly we leave the construct alone and let the query raise an error. In considering what I just wrote I'm not particularly enamored with it...hence my overall conclusion. Can't say I hate it and after re-reading the aforementioned thread I'm inclined to like it for cases where, for instance, we are susceptible to a LCM evaluation. David J.
Re: [HACKERS] Changed SRF in targetlist handling
On 23 May 2016 at 08:53, Andres Freund wrote: > Hi, > > discussing executor performance with a number of people at pgcon, > several hackers - me included - complained about the additional > complexity, both code and runtime, required to handle SRFs in the target > list. > > One idea I circulated was to fix that by interjecting a special executor > node to process SRF containing targetlists (reusing Result possibly?). > That'd allow to remove the isDone argument from ExecEval*/ExecProject* > and get rid of ps_TupFromTlist which is fairly ugly. > > > Robert suggested - IIRC mentioning previous on-list discussion - to > instead rewrite targetlist SRFs into lateral joins. My gut feeling is > that that'd be a larger undertaking, with significant semantics changes. > > If we accept bigger semantical changes, I'm inclined to instead just get > rid of targetlist SRFs in total; they're really weird and not needed > anymore. > > One issue with removing targetlist SRFs is that they're currently > considerably faster than SRFs in FROM: > tpch[14693][1]=# COPY (SELECT * FROM generate_series(1, 1000)) TO > '/dev/null'; > COPY 1000 > Time: 2217.167 ms > tpch[14693][1]=# COPY (SELECT generate_series(1, 1000)) TO '/dev/null'; > COPY 1000 > Time: 1355.929 ms > tpch[14693][1]=# > > I'm no tto concerned about that, and we could probably fixing by > removing forced materialization from the relevant code path. > > Comments? > > SRFs-in-tlist are a lot faster for lockstep iteration etc. They're also much simpler to write, though if the result result rowcount differs unexpectedly between the functions you get exciting and unexpected behaviour. WITH ORDINALITY provides what I think is the last of the functionality needed to replace SRFs-in-from, but at a syntatactic complexity and performance cost. The following example demonstrates that, though it doesn't do anything that needs LATERAL etc. I'm aware the following aren't semantically identical if the rowcounts differ. craig=> EXPLAIN ANALYZE SELECT generate_series(1,100) x, generate_series(1,100) y; QUERY PLAN -- Result (cost=0.00..5.01 rows=1000 width=0) (actual time=0.024..92.845 rows=100 loops=1) Planning time: 0.039 ms Execution time: 123.123 ms (3 rows) Time: 123.719 ms craig=> EXPLAIN ANALYZE SELECT x, y FROM generate_series(1,100) WITH ORDINALITY AS x(i, n) INNER JOIN generate_series(1,100) WITH ORDINALITY AS y(i, n) ON (x.n = y.n); QUERY PLAN -- Merge Join (cost=0.01..97.50 rows=5000 width=64) (actual time=179.863..938.375 rows=100 loops=1) Merge Cond: (x.n = y.n) -> Function Scan on generate_series x (cost=0.00..10.00 rows=1000 width=40) (actual time=108.813..303.690 rows=100 loops=1) -> Materialize (cost=0.00..12.50 rows=1000 width=40) (actual time=71.043..372.880 rows=100 loops=1) -> Function Scan on generate_series y (cost=0.00..10.00 rows=1000 width=40) (actual time=71.039..266.209 rows=100 loops=1) Planning time: 0.184 ms Execution time: 970.744 ms (7 rows) Time: 971.706 ms I get the impression the with-ordinality case could perform just as well if the optimiser recognised a join on the ordinality column and iterated the functions in lockstep to populate the result row directly. Though that could perform _worse_ if the function is computationally costly and benefits significantly from the CPU cache, where we're better off materializing it or at least executing it in chunks/batches... -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
[HACKERS] Changed SRF in targetlist handling
Hi, discussing executor performance with a number of people at pgcon, several hackers - me included - complained about the additional complexity, both code and runtime, required to handle SRFs in the target list. One idea I circulated was to fix that by interjecting a special executor node to process SRF containing targetlists (reusing Result possibly?). That'd allow to remove the isDone argument from ExecEval*/ExecProject* and get rid of ps_TupFromTlist which is fairly ugly. Robert suggested - IIRC mentioning previous on-list discussion - to instead rewrite targetlist SRFs into lateral joins. My gut feeling is that that'd be a larger undertaking, with significant semantics changes. If we accept bigger semantical changes, I'm inclined to instead just get rid of targetlist SRFs in total; they're really weird and not needed anymore. One issue with removing targetlist SRFs is that they're currently considerably faster than SRFs in FROM: tpch[14693][1]=# COPY (SELECT * FROM generate_series(1, 1000)) TO '/dev/null'; COPY 1000 Time: 2217.167 ms tpch[14693][1]=# COPY (SELECT generate_series(1, 1000)) TO '/dev/null'; COPY 1000 Time: 1355.929 ms tpch[14693][1]=# I'm no tto concerned about that, and we could probably fixing by removing forced materialization from the relevant code path. Comments? Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers