Re: [HACKERS] PostgreSQL 9.6 behavior change with set returning (funct).*
On Wed, Mar 23, 2016 at 3:18 PM, Stephen Frost wrote: > I have to side with what I believe is Tom's position on this one. I do > like the notion of throwing an error in cases where someone sent us > something that we're pretty sure is wrong, but I don't agree that we > should continue to carry on bug-compatibility with things that are > already one foot in the grave and really just need to be shoved all the > way in. Well, the fact that it turns out to be 2+ SRF, not just 1 as a trigger has significantly lowered my alarm. I agree that such usages are tiny and the LCM way of determining rows is definitely bizarre. Don't believe me? Try figuring out when select generate_series(1,nextval('s')), generate_series(1,nextval('s')); terminates. (hint: it doesn't). Another cute multiple SRF invocation: http://merlinmoncure.blogspot.com/2007/12/12-days-of-christmas-postgresql-style.html :-) > This isn't the only break in backwards compatibility we've had over the > years and is pretty far from the largest (string escaping, anyone? or > removing implicit casts?) and I'd argue we're better off for it. String escaping was an unmitigated disaster. Implict cast removal cost my company a couple of hundred thousand bucks and came within a hair of pushing postgres out completely (not that I'm complaining, we're the better for that but these decisions must not be taken lightly). Things are different now. On Wed, Mar 23, 2016 at 5:34 PM, Tom Lane wrote: > (The SRF invocations might be buried inside expressions, but we'd find > them and convert them anyway. Also, we'd merge textually-identical SRF > invocations into a single ROWS FROM entry to avoid multiple evaluation, > at least for SRFs not marked volatile.) Having done that, the executor's > support for SRFs in general expressions could be removed, a significant > savings. That's just brilliant -- I'd be on board with that FWIW. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL 9.6 behavior change with set returning (funct).*
Robert Haas writes: > On Wed, Mar 23, 2016 at 6:34 PM, Tom Lane wrote: >> A possibly larger problem is that it causes the SRFs to be evaluated >> before sorting/ordering/limiting. > I'm not sure I understand quite what the problem is here. If you have "SELECT srf(x), y FROM ... ORDER BY y LIMIT n", we go to some lengths as of 9118d03a8 to ensure that the ordering happens before the SRF is expanded, meaning in particular that the SRF is expanded only far enough to satisfy the limit. That is not the case for SRF-in-FROM, and can't be if for instance there's a GROUP BY involved. As-is, the proposed transformation obviously breaks the semantics if grouping/aggregation/windowing is involved, and my point is that that's true for ORDER BY/LIMIT as well. We'd need to fix things so that the apparent order of performing those steps stays the same. I think that's probably doable in most cases by making a sub-select, but I'm not sure if that works for every case --- in particular, what do we do with a SRF referenced in ORDER BY or GROUP BY? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL 9.6 behavior change with set returning (funct).*
On Wed, Mar 23, 2016 at 6:34 PM, Tom Lane wrote: > I wrote: >> ... I'd love to >> toss the entire SRF-in-tlist feature overboard one of these years, >> both because of semantic issues like these and because a fair amount >> of code and overhead could be ripped out if it were disallowed. >> But I don't know how we get to that --- as Merlin says, there's a >> pretty large amount of application code depending on the feature. > > BTW, after further thought I seem to recall some discussion about whether > we could mechanically transform SRF-in-tlist into a LATERAL query. > That is, we could make the rewriter or planner convert > > SELECT srf1(...), srf2(...) > FROM ... > WHERE ... > > into > > SELECT u.s1, u.s2 > FROM ..., LATERAL ROWS FROM(srf1(...), srf2(...)) AS u(s1,s2) > WHERE ... I think *that* would be grand. If I'm not wrong, that's the behavior that anybody would naively expect. > (The SRF invocations might be buried inside expressions, but we'd find > them and convert them anyway. Also, we'd merge textually-identical SRF > invocations into a single ROWS FROM entry to avoid multiple evaluation, > at least for SRFs not marked volatile.) Having done that, the executor's > support for SRFs in general expressions could be removed, a significant > savings. That is a highly appealing fringe benefit. > One problem with this is that it only duplicates the current behavior > in cases where the SRFs all have the same period. But you could argue > that the current behavior in cases where they don't is widely considered > a bug anyway. I would so argue. Also, wouldn't this fix the problem that (srf(blah)).* causes multiple evaluation? That would be *awesome*. > A possibly larger problem is that it causes the SRFs to be evaluated > before sorting/ordering/limiting. I'm not sure I understand quite what the problem is here. If there's a LIMIT, then the proposed transformation would presumably run the SRF only enough times to fill the limit. I'm not sure you have any guarantees about ordering anyway - doesn't that depend on whether the planner can find a way to produce presorted output via an index scan, merge join, etc.? > In view of the discussions that led > up to 9118d03a8, that could be a fatal objection :-(. Maybe we could > get around it with a different transformation, into a two-level query? > The above sketch doesn't really consider what to do with GROUP BY, > ORDER BY, etc, but maybe we could push those down into a sub-select > that becomes the first FROM item. That seems like it might work. > Anyway, that's food for thought not something that could realistically > happen right now. Ah, bummer. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL 9.6 behavior change with set returning (funct).*
> I'm something of a backwards compatibility zealot, but I've become one for > very good reasons. Personally, I'd rather we'd define precisely the usages > that are deprecated (I guess SRF-tlist in the presence of > FROM) and force them to error out with an appropriate HINT rather than give a > different answer than they used to. The problem here is that LATERAL is > still fairly new and there is a huge body of code out there leveraging the > 'bad' way, as it was for years > and years the only way to do a number of > useful things. > merlin FWIW: I prefer Merlin's solution of erroring out rather than returning an unexpected answer and if it's a buggy behavior it should be eradicated. The reason is this. For many the (..).* ORDER BY .. looks equivalent to the lateral. More than a trivial amount of my time has been spent explaining to people why their raster queries are so slow because the SRF is called multiple times and they should switch to LATERAL usage. So if the old solution is still going to have the same penalty's I would much assume just scrap it and break people's code in a clear and noticeable way they can't ignore. There is nothing more frustrating than code that still works but gives you an answer different than what you are expecting. Those kind of bugs stay buried for a while. I think as long as it's noted in the release notes of the breaking change it's fine and called for if it makes your code cleaner and more manageable. Thanks, Regina -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL 9.6 behavior change with set returning (funct).*
I wrote: > ... I'd love to > toss the entire SRF-in-tlist feature overboard one of these years, > both because of semantic issues like these and because a fair amount > of code and overhead could be ripped out if it were disallowed. > But I don't know how we get to that --- as Merlin says, there's a > pretty large amount of application code depending on the feature. BTW, after further thought I seem to recall some discussion about whether we could mechanically transform SRF-in-tlist into a LATERAL query. That is, we could make the rewriter or planner convert SELECT srf1(...), srf2(...) FROM ... WHERE ... into SELECT u.s1, u.s2 FROM ..., LATERAL ROWS FROM(srf1(...), srf2(...)) AS u(s1,s2) WHERE ... (The SRF invocations might be buried inside expressions, but we'd find them and convert them anyway. Also, we'd merge textually-identical SRF invocations into a single ROWS FROM entry to avoid multiple evaluation, at least for SRFs not marked volatile.) Having done that, the executor's support for SRFs in general expressions could be removed, a significant savings. One problem with this is that it only duplicates the current behavior in cases where the SRFs all have the same period. But you could argue that the current behavior in cases where they don't is widely considered a bug anyway. A possibly larger problem is that it causes the SRFs to be evaluated before sorting/ordering/limiting. In view of the discussions that led up to 9118d03a8, that could be a fatal objection :-(. Maybe we could get around it with a different transformation, into a two-level query? The above sketch doesn't really consider what to do with GROUP BY, ORDER BY, etc, but maybe we could push those down into a sub-select that becomes the first FROM item. Anyway, that's food for thought not something that could realistically happen right now. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL 9.6 behavior change with set returning (funct).*
* David G. Johnston (david.g.johns...@gmail.com) wrote: > On Wed, Mar 23, 2016 at 2:11 PM, Tom Lane wrote: > > In the meantime I suppose there's a case to be made for preserving > > bug compatibility as much as possible. > > > > So anyway the question is whether to commit the attached or not. > > +1 for commit - I'll trust Tom on the quality of the patch :) I'm not going to object to it. All-in-all, I suppose '+0' from me. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] PostgreSQL 9.6 behavior change with set returning (funct).*
On Wed, Mar 23, 2016 at 2:11 PM, Tom Lane wrote: > > > In the meantime I suppose there's a case to be made for preserving > bug compatibility as much as possible. > > So anyway the question is whether to commit the attached or not. +1 for commit - I'll trust Tom on the quality of the patch :) David J.
Re: [HACKERS] PostgreSQL 9.6 behavior change with set returning (funct).*
Stephen Frost writes: > * Merlin Moncure (mmonc...@gmail.com) wrote: >> I'm something of a backwards compatibility zealot, but I've become one >> for very good reasons. Personally, I'd rather we'd define precisely >> the usages that are deprecated (I guess SRF-tlist in the presence of >> FROM) and force them to error out with an appropriate HINT rather than >> give a different answer than they used to. The problem here is that >> LATERAL is still fairly new and there is a huge body of code out there >> leveraging the 'bad' way, as it was for years and years the only way >> to do a number of useful things. > I have to side with what I believe is Tom's position on this one. I do > like the notion of throwing an error in cases where someone sent us > something that we're pretty sure is wrong, but I don't agree that we > should continue to carry on bug-compatibility with things that are > already one foot in the grave and really just need to be shoved all the > way in. FWIW, I think the case that we're looking at only comes up if you have more than one SRF expression in the tlist (Regina's example has that after *-expansion, though it might not've looked so to start with), and some of them are sort/group columns while others are not. So the question at hand is whether that falls within the scope of "huge body of code leveraging the old way" or whether it's too much of a corner case to want to tie ourselves down to. I wrote a patch that fixes this for the sort-column case, thus getting us back to the historical behavior; see attached. If we really wanted to be consistent, we'd have to look at pushing all SRFs clear back to the scanjoin_target list if any SRFs appear in grouping columns. That would be significantly more code and it would deviate from the historical behavior, as I illustrated upthread, so I'm not really inclined to do it. But the historical behavior in this area is pretty unprincipled. It's also worth noting that we've had multiple complaints about the ExecTargetList least-common-multiple behavior over the years; it seems sane enough in examples like these where all the SRFs have the same period, but it gets way less so as soon as they don't. I'd love to toss the entire SRF-in-tlist feature overboard one of these years, both because of semantic issues like these and because a fair amount of code and overhead could be ripped out if it were disallowed. But I don't know how we get to that --- as Merlin says, there's a pretty large amount of application code depending on the feature. In the meantime I suppose there's a case to be made for preserving bug compatibility as much as possible. So anyway the question is whether to commit the attached or not. regards, tom lane diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c index 5229c84..db347b8 100644 *** a/src/backend/optimizer/plan/planner.c --- b/src/backend/optimizer/plan/planner.c *** make_pathkeys_for_window(PlannerInfo *ro *** 4615,4625 * * Our current policy is to postpone volatile expressions till after the sort * unconditionally (assuming that that's possible, ie they are in plain tlist ! * columns and not ORDER BY/GROUP BY/DISTINCT columns). We also postpone ! * set-returning expressions unconditionally (if possible), because running ! * them beforehand would bloat the sort dataset, and because it might cause ! * unexpected output order if the sort isn't stable. Expensive expressions ! * are postponed if there is a LIMIT, or if root->tuple_fraction shows that * partial evaluation of the query is possible (if neither is true, we expect * to have to evaluate the expressions for every row anyway), or if there are * any volatile or set-returning expressions (since once we've put in a --- 4615,4630 * * Our current policy is to postpone volatile expressions till after the sort * unconditionally (assuming that that's possible, ie they are in plain tlist ! * columns and not ORDER BY/GROUP BY/DISTINCT columns). We also prefer to ! * postpone set-returning expressions, because running them beforehand would ! * bloat the sort dataset, and because it might cause unexpected output order ! * if the sort isn't stable. However there's a constraint on that: all SRFs ! * in the tlist should be evaluated at the same plan step, so that they can ! * run in sync in ExecTargetList. So if any SRFs are in sort columns, we ! * mustn't postpone any SRFs. (Note that in principle that policy should ! * probably get applied to the group/window input targetlists too, but we ! * have not done that historically.) Lastly, expensive expressions are ! * postponed if there is a LIMIT, or if root->tuple_fraction shows that * partial evaluation of the query is possible (if neither is true, we expect * to have to evaluate the expressions for every row anyway), or if there are * any volatile or set-returning expressions (since o
Re: [HACKERS] PostgreSQL 9.6 behavior change with set returning (funct).*
* Merlin Moncure (mmonc...@gmail.com) wrote: > On Wed, Mar 23, 2016 at 12:37 PM, Tom Lane wrote: > > which is both SQL-standard semantics and much more efficient than > > SRF-in-tlist. We've more or less deprecated SRF-in-tlist since we > > introduced LATERAL in 9.3. How much are we willing to do to stay > > bug-compatible with old behaviors here? > > I think we should, and the fact this was caught so early on the > release cycle underscores that. One of the problems is that there are > reasonable cases (note, not impacted by this bug) of this usage that > are still commonplace, for example: > > ysanalysis=# select unnest(current_schemas(true)); >unnest > > pg_catalog > public > > I'm something of a backwards compatibility zealot, but I've become one > for very good reasons. Personally, I'd rather we'd define precisely > the usages that are deprecated (I guess SRF-tlist in the presence of > FROM) and force them to error out with an appropriate HINT rather than > give a different answer than they used to. The problem here is that > LATERAL is still fairly new and there is a huge body of code out there > leveraging the 'bad' way, as it was for years and years the only way > to do a number of useful things. I have to side with what I believe is Tom's position on this one. I do like the notion of throwing an error in cases where someone sent us something that we're pretty sure is wrong, but I don't agree that we should continue to carry on bug-compatibility with things that are already one foot in the grave and really just need to be shoved all the way in. This isn't the only break in backwards compatibility we've had over the years and is pretty far from the largest (string escaping, anyone? or removing implicit casts?) and I'd argue we're better off for it. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] PostgreSQL 9.6 behavior change with set returning (funct).*
"David G. Johnston" writes: > On Wednesday, March 23, 2016, Tom Lane wrote: >> ... We've more or less deprecated SRF-in-tlist since we >> introduced LATERAL in 9.3. How much are we willing to do to stay >> bug-compatible with old behaviors here? > My gut reaction is that this is an unnecessary regression for the sake of a > performance optimization that is likely drowned out in the usage presented > anyway. > The pivot for me would be how hard would it be to maintain the old behavior > in this "more or less deprecated" scenario. I have no way to judge that. Well, it'd make make_sort_input_target() more complicated and a bit slower, mainly because it would have to check SRF-ness of sorting columns that it currently has no need to inspect at all. My concern here is not really with that, it's with trying to draw a boundary around what behavior we're going to promise bug-compatibility with. To illustrate, you really can get the multiple-expansions behavior with existing releases, eg in 9.4 regression=# SELECT (dumpset(f.test, 'hello world' || f.test)).*, f.test FROM generate_series(1,4) As f(test) GROUP BY junk2, f.test; id | junk1 | junk2 | test +---+---+-- 1 | 3hello world3 | 31|3 2 | 3hello world3 | 31|3 1 | 4hello world4 | 42|4 2 | 4hello world4 | 42|4 1 | 4hello world4 | 41|4 2 | 4hello world4 | 41|4 1 | 1hello world1 | 11|1 2 | 1hello world1 | 11|1 1 | 3hello world3 | 32|3 2 | 3hello world3 | 32|3 1 | 2hello world2 | 21|2 2 | 2hello world2 | 21|2 1 | 2hello world2 | 22|2 2 | 2hello world2 | 22|2 1 | 1hello world1 | 12|1 2 | 1hello world1 | 12|1 (16 rows) which is kind of fun to wrap your head around, but after awhile you realize that dumpset(...).junk2 is being evaluated before the GROUP BY and dumpset.(...).id after it, which is how come the grouping behavior is so obviously violated. AFAICS, the only way to ensure non-crazy behavior for such examples would be to force all SRFs in the tlist to be evaluated at the same plan step. Which we've never done in the past, and if we were to start doing so, it would cause a behavioral change for examples like this one. Anyway, my concern is just that we're deciding to stay bug-compatible with some behaviors that were not very well thought out to start with, and we don't even have a clear understanding of where the limits of that compatibility will be. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL 9.6 behavior change with set returning (funct).*
On Wed, Mar 23, 2016 at 12:37 PM, Tom Lane wrote: > which is both SQL-standard semantics and much more efficient than > SRF-in-tlist. We've more or less deprecated SRF-in-tlist since we > introduced LATERAL in 9.3. How much are we willing to do to stay > bug-compatible with old behaviors here? I think we should, and the fact this was caught so early on the release cycle underscores that. One of the problems is that there are reasonable cases (note, not impacted by this bug) of this usage that are still commonplace, for example: ysanalysis=# select unnest(current_schemas(true)); unnest pg_catalog public I'm something of a backwards compatibility zealot, but I've become one for very good reasons. Personally, I'd rather we'd define precisely the usages that are deprecated (I guess SRF-tlist in the presence of FROM) and force them to error out with an appropriate HINT rather than give a different answer than they used to. The problem here is that LATERAL is still fairly new and there is a huge body of code out there leveraging the 'bad' way, as it was for years and years the only way to do a number of useful things. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PostgreSQL 9.6 behavior change with set returning (funct).*
On Wednesday, March 23, 2016, Tom Lane wrote: > "Regina Obe" > writes: > > In the past couple of weeks our PostGIS tests against PostgreSQL 9.6 dev > > started failing. I traced the issue down to a behavior change in 9.6 > when > > dealing with output of set returning functions when used with (func).* > > syntax. > > > CREATE OR REPLACE FUNCTION dumpset(param_num integer, param_text text) > > RETURNS TABLE(id integer, junk1 text, junk2 text) > > ... > > -- Get 16 rows in 9.6, Get 8 rows in 9.5 > > SELECT (dumpset(f.test, 'hello world' || f.test)).* > > FROM generate_series(1,4) As f(test) > > ORDER BY junk2; > > I think this is a side-effect of 9118d03a8 ("When appropriate, postpone > SELECT output expressions till after ORDER BY"). Previously, although you > got N evaluations of the SRF which is pretty horrid, they were all in the > same targetlist and hence ran in sync and produced only the expected > number of rows (thanks to the otherwise-indefensible ExecTargetList > behavior by which multiple SRF tlist expressions produce a number of rows > equal to the least common multiple of their periods, not the product). > That commit causes the evaluation of dumpset(...).junk1 to be postponed > till after the Sort step, but the evaluation of dumpset(...).junk2 > necessarily can't be. So now you get dumpset(...).junk2 inflating > the original rowcount 2X, and then dumpset(...).junk1 inflating it > another 2X after the Sort. > > We could remain bug-compatible with the old behavior by adding some > restriction to keep all the tlist SRFs getting evaluated at the same > plan step, at least to the extent that we can. I think you could get > similar strange behaviors in prior releases if you used GROUP BY or > another syntax that might result in early evaluation of the sort column, > and we're not going to be able to fix that. But we could prevent this > particular optimization from introducing new strangeness. > > But I'm not really sure that we should. The way that you *should* > write this query is > > SELECT ds.* > FROM generate_series(1,4) AS f(test), > dumpset(f.test, 'hello world' || f.test) AS ds > ORDER BY junk2; > > which is both SQL-standard semantics and much more efficient than > SRF-in-tlist. We've more or less deprecated SRF-in-tlist since we > introduced LATERAL in 9.3. How much are we willing to do to stay > bug-compatible with old behaviors here? > > My gut reaction is that this is an unnecessary regression for the sake of a performance optimization that is likely drowned out in the usage presented anyway. The pivot for me would be how hard would it be to maintain the old behavior in this "more or less deprecated" scenario. I have no way to judge that. David J.
Re: [HACKERS] PostgreSQL 9.6 behavior change with set returning (funct).*
"Regina Obe" writes: > In the past couple of weeks our PostGIS tests against PostgreSQL 9.6 dev > started failing. I traced the issue down to a behavior change in 9.6 when > dealing with output of set returning functions when used with (func).* > syntax. > CREATE OR REPLACE FUNCTION dumpset(param_num integer, param_text text) > RETURNS TABLE(id integer, junk1 text, junk2 text) > ... > -- Get 16 rows in 9.6, Get 8 rows in 9.5 > SELECT (dumpset(f.test, 'hello world' || f.test)).* > FROM generate_series(1,4) As f(test) > ORDER BY junk2; I think this is a side-effect of 9118d03a8 ("When appropriate, postpone SELECT output expressions till after ORDER BY"). Previously, although you got N evaluations of the SRF which is pretty horrid, they were all in the same targetlist and hence ran in sync and produced only the expected number of rows (thanks to the otherwise-indefensible ExecTargetList behavior by which multiple SRF tlist expressions produce a number of rows equal to the least common multiple of their periods, not the product). That commit causes the evaluation of dumpset(...).junk1 to be postponed till after the Sort step, but the evaluation of dumpset(...).junk2 necessarily can't be. So now you get dumpset(...).junk2 inflating the original rowcount 2X, and then dumpset(...).junk1 inflating it another 2X after the Sort. We could remain bug-compatible with the old behavior by adding some restriction to keep all the tlist SRFs getting evaluated at the same plan step, at least to the extent that we can. I think you could get similar strange behaviors in prior releases if you used GROUP BY or another syntax that might result in early evaluation of the sort column, and we're not going to be able to fix that. But we could prevent this particular optimization from introducing new strangeness. But I'm not really sure that we should. The way that you *should* write this query is SELECT ds.* FROM generate_series(1,4) AS f(test), dumpset(f.test, 'hello world' || f.test) AS ds ORDER BY junk2; which is both SQL-standard semantics and much more efficient than SRF-in-tlist. We've more or less deprecated SRF-in-tlist since we introduced LATERAL in 9.3. How much are we willing to do to stay bug-compatible with old behaviors here? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers