Re: [HACKERS] LATERAL quals revisited
Some time ago, I wrote: I've been studying the bug reported at http://www.postgresql.org/message-id/20130617235236.GA1636@jeremyevans.local ... After some contemplation, I think that the most practical way to fix this is for deconstruct_recurse and distribute_qual_to_rels to effectively move such a qual to the place where it logically belongs; that is, rather than processing it when we look at the lower WHERE clause, set it aside for a moment and then add it back when looking at the ON clause of the appropriate outer join. This should be reasonably easy to do by keeping a list of postponed lateral clauses while we're scanning the join tree. Here's a draft patch for this. The comments need a bit more work probably, but barring objection I want to push this in before this afternoon's 9.3rc1 wrap. regards, tom lane diff --git a/src/backend/optimizer/plan/initsplan.c b/src/backend/optimizer/plan/initsplan.c index 98f601c..e055088 100644 *** a/src/backend/optimizer/plan/initsplan.c --- b/src/backend/optimizer/plan/initsplan.c *** int from_collapse_limit; *** 36,47 int join_collapse_limit; static void extract_lateral_references(PlannerInfo *root, RelOptInfo *brel, Index rtindex); static void add_lateral_info(PlannerInfo *root, Relids lhs, Relids rhs); static List *deconstruct_recurse(PlannerInfo *root, Node *jtnode, bool below_outer_join, ! Relids *qualscope, Relids *inner_join_rels); static SpecialJoinInfo *make_outerjoininfo(PlannerInfo *root, Relids left_rels, Relids right_rels, Relids inner_join_rels, --- 36,56 int join_collapse_limit; + /* Elements of the postponed_qual_list used during deconstruct_recurse */ + typedef struct PostponedQual + { + Node *qual; /* a qual clause waiting to be processed */ + Relids relids; /* the set of baserels it references */ + } PostponedQual; + + static void extract_lateral_references(PlannerInfo *root, RelOptInfo *brel, Index rtindex); static void add_lateral_info(PlannerInfo *root, Relids lhs, Relids rhs); static List *deconstruct_recurse(PlannerInfo *root, Node *jtnode, bool below_outer_join, ! Relids *qualscope, Relids *inner_join_rels, ! List **postponed_qual_list); static SpecialJoinInfo *make_outerjoininfo(PlannerInfo *root, Relids left_rels, Relids right_rels, Relids inner_join_rels, *** static void distribute_qual_to_rels(Plan *** 53,59 Relids qualscope, Relids ojscope, Relids outerjoin_nonnullable, ! Relids deduced_nullable_relids); static bool check_outerjoin_delay(PlannerInfo *root, Relids *relids_p, Relids *nullable_relids_p, bool is_pushed_down); static bool check_equivalence_delay(PlannerInfo *root, --- 62,69 Relids qualscope, Relids ojscope, Relids outerjoin_nonnullable, ! Relids deduced_nullable_relids, ! List **postponed_qual_list); static bool check_outerjoin_delay(PlannerInfo *root, Relids *relids_p, Relids *nullable_relids_p, bool is_pushed_down); static bool check_equivalence_delay(PlannerInfo *root, *** add_lateral_info(PlannerInfo *root, Reli *** 630,644 List * deconstruct_jointree(PlannerInfo *root) { Relids qualscope; Relids inner_join_rels; /* Start recursion at top of jointree */ Assert(root-parse-jointree != NULL IsA(root-parse-jointree, FromExpr)); ! return deconstruct_recurse(root, (Node *) root-parse-jointree, false, ! qualscope, inner_join_rels); } /* --- 640,662 List * deconstruct_jointree(PlannerInfo *root) { + List *result; Relids qualscope; Relids inner_join_rels; + List *postponed_qual_list = NIL; /* Start recursion at top of jointree */ Assert(root-parse-jointree != NULL IsA(root-parse-jointree, FromExpr)); ! result = deconstruct_recurse(root, (Node *) root-parse-jointree, false, ! qualscope, inner_join_rels, ! postponed_qual_list); ! ! /* Shouldn't be any leftover quals */ ! Assert(postponed_qual_list == NIL); ! ! return result; } /* *** deconstruct_jointree(PlannerInfo *root) *** 656,668 * *inner_join_rels gets the set of base Relids syntactically included in * inner joins appearing at or below this jointree node (do not modify * or free this, either) * Return value is the appropriate joinlist for this jointree node * * In addition, entries will be added to root-join_info_list for outer joins. */ static List * deconstruct_recurse(PlannerInfo *root, Node *jtnode, bool below_outer_join, ! Relids *qualscope, Relids *inner_join_rels) { List *joinlist; --- 674,689 * *inner_join_rels gets the set of base Relids syntactically included in * inner joins appearing at or below this jointree node (do not modify * or
Re: [HACKERS] LATERAL quals revisited
Antonin Houska antonin.hou...@gmail.com writes: On 07/04/2013 06:11 PM, Antonin Houska wrote: On 07/03/2013 08:32 PM, Tom Lane wrote: Another possibility would be to keep the optimization, but disable it in queries that use LATERAL. I don't much care for that though --- seems too Rube Goldbergish, and in any case I have a lot less faith in the whole concept now than I had before I started digging into this issue. I constructed a query that triggers the optimization - see attachment with comments. Thanks for poking at this. EXPLAIN shows the same plan with or without the ph_may_need optimization, but that might be data problem (my tables are empty). Yeah, I didn't have much luck getting a different plan even with data in the tables. What you'd need for this to be important would be for a join order that's precluded without the ph_may_need logic to be significantly better than the join orders that are still allowed. While that's certainly within the realm of possibility, the difficulty of triggering the case at all reinforces my feeling that this optimization isn't worth bothering with. For the moment I'm just going to take it out. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LATERAL quals revisited
I have couple of questions. On Wed, Jun 26, 2013 at 1:30 AM, Tom Lane t...@sss.pgh.pa.us wrote: I've been studying the bug reported at http://www.postgresql.org/message-id/20130617235236.GA1636@jeremyevans.local that the planner can do the wrong thing with queries like SELECT * FROM i LEFT JOIN LATERAL (SELECT * FROM j WHERE i.n = j.n) j ON true; I think the fundamental problem is that, because the i.n = j.n clause appears syntactically in WHERE, the planner is treating it as if it were an inner-join clause; but really it ought to be considered a clause of the upper LEFT JOIN. That is, semantically this query ought to be equivalent to SELECT * FROM i LEFT JOIN LATERAL (SELECT * FROM j) j ON i.n = j.n; However, because distribute_qual_to_rels doesn't see the clause as being attached to the outer join, it's not marked with the correct properties and ends up getting evaluated in the wrong place (as a filter clause not a join filter clause). The bug is masked in the test cases we've used so far because those cases are designed to let the clause get pushed down into the scan of the inner relation --- but if it doesn't get pushed down, it's evaluated the wrong way. After some contemplation, I think that the most practical way to fix this is for deconstruct_recurse and distribute_qual_to_rels to effectively move such a qual to the place where it logically belongs; that is, rather than processing it when we look at the lower WHERE clause, set it aside for a moment and then add it back when looking at the ON clause of the appropriate outer join. This should be reasonably easy to do by keeping a list of postponed lateral clauses while we're scanning the join tree. For there to *be* a unique appropriate outer join, we need to require that a LATERAL-using qual clause that's under an outer join contain lateral references only to the outer side of the nearest enclosing outer join. There's no such restriction in the spec of course, but we can make it so by refusing to flatten a sub-select if pulling it up would result in having a clause in the outer query that violates this rule. There's already some code in prepjointree.c (around line 1300) that attempts to enforce this, though now that I look at it again I'm not sure it's covering all the bases. We may need to extend that check. Why do we need this restriction? Wouldn't a place (specifically join qual at such a place) in join tree where all the participating relations are present, serve as a place where the clause can be applied. E.g. in the query select * from tab1 left join tab2 t2 using (val) left join lateral (select val from tab2 where val2 = tab1.val * t2.val) t3 using (val); Can't we apply (as a join qual) the qual val2 = tab1.val * t2.val at a place where we are computing join between tab1, t2 and t3? I'm inclined to process all LATERAL-using qual clauses this way, ie postpone them till we recurse back up to a place where they can logically be evaluated. That won't make any real difference when no outer joins are present, but it will eliminate the ugliness that right now distribute_qual_to_rels is prevented from sanity-checking the scope of the references in a qual when LATERAL is present. If we do it like this, we can resurrect full enforcement of that sanity check, and then throw an error if any postponed quals are left over when we're done recursing. Parameterized nested loop join would always be able to evaluate a LATERAL query. Instead of throwing error, why can't we choose that as the default strategy whenever we fail to flatten subquery? Can we put the clause with lateral references at its appropriate place while flattening the subquery? IMO, that will be cleaner and lesser work than first pulling the clause and then putting it back again? Right, now, we do not have that capability in pull_up_subqueries() but given its recursive structure, it might be easier to do it there. Thoughts, better ideas? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Best Wishes, Ashutosh Bapat EntepriseDB Corporation The Postgres Database Company
Re: [HACKERS] LATERAL quals revisited
Ashutosh Bapat ashutosh.ba...@enterprisedb.com writes: On Wed, Jun 26, 2013 at 1:30 AM, Tom Lane t...@sss.pgh.pa.us wrote: For there to *be* a unique appropriate outer join, we need to require that a LATERAL-using qual clause that's under an outer join contain lateral references only to the outer side of the nearest enclosing outer join. There's no such restriction in the spec of course, but we can make it so by refusing to flatten a sub-select if pulling it up would result in having a clause in the outer query that violates this rule. There's already some code in prepjointree.c (around line 1300) that attempts to enforce this, though now that I look at it again I'm not sure it's covering all the bases. We may need to extend that check. Why do we need this restriction? Wouldn't a place (specifically join qual at such a place) in join tree where all the participating relations are present, serve as a place where the clause can be applied. No. If you hoist a qual that appears below an outer join to above the outer join, you get wrong results in general: you might eliminate rows from the outer side of the join, which a qual from within the inner side should never be able to do. select * from tab1 left join tab2 t2 using (val) left join lateral (select val from tab2 where val2 = tab1.val * t2.val) t3 using (val); Can't we apply (as a join qual) the qual val2 = tab1.val * t2.val at a place where we are computing join between tab1, t2 and t3? This particular example doesn't violate the rule I gave above, since both tab1 and t2 are on the left side of the join to the lateral subquery, and the qual doesn't have to get hoisted *past* an outer join, only to the outer join of {tab1,t2} with {t3}. I'm inclined to process all LATERAL-using qual clauses this way, ie postpone them till we recurse back up to a place where they can logically be evaluated. That won't make any real difference when no outer joins are present, but it will eliminate the ugliness that right now distribute_qual_to_rels is prevented from sanity-checking the scope of the references in a qual when LATERAL is present. If we do it like this, we can resurrect full enforcement of that sanity check, and then throw an error if any postponed quals are left over when we're done recursing. Parameterized nested loop join would always be able to evaluate a LATERAL query. Instead of throwing error, why can't we choose that as the default strategy whenever we fail to flatten subquery? I think you misunderstood. That error would only be a sanity check that we'd accounted for all qual clauses, it's not something a user should ever see. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LATERAL quals revisited
On 07/04/2013 06:11 PM, Antonin Houska wrote: On 07/03/2013 08:32 PM, Tom Lane wrote: Another possibility would be to keep the optimization, but disable it in queries that use LATERAL. I don't much care for that though --- seems too Rube Goldbergish, and in any case I have a lot less faith in the whole concept now than I had before I started digging into this issue. Thoughts? I noticed EXPLAIN in some regression tests. So if they all pass after removal of this optimization, it might indicate that it was really insignificant. But alternatively it may just be a lack of focus on this feature in the test queries. Digging for (non-LATERAL) queries or rather patterns where the ph_may_need optimization clearly appears to be important sounds to me like a good SQL exercise, but I'm afraid I won't have time for it in the next few days. I constructed a query that triggers the optimization - see attachment with comments. (Note that the relid sets are derived from my current knowledge of the logic. I haven't figured out how to check them easily in gdb session.) The intention was that the top-level OJ references LHS of the join below rather than the RHS. That should increase the likelihood that the PHV becomes the only obstacle for join commuting. And therefore the ph_may_need optimization should unblock some combinations that would be impossible otherwise. However I could not see the condition if (bms_is_subset(phinfo-ph_may_need, min_righthand)) continue; met for the top-level join even though the supposed ph_may_need did not contain tab1. Then it struck me that min_righthand can be the problem. So I changed the join clause to reference RHS of j1, hoping that it should make min_righthand bigger. And that really triggered the condition. EXPLAIN shows the same plan with or without the ph_may_need optimization, but that might be data problem (my tables are empty). More important is the fact that I could only avoid addition of the PHV's eval_at to min_righthand at the cost of adding the whole j1 join (i.e. more than just eval_at). Although the idea behind ph_may_need is clever, I can now imagine that other techniques of the planner can substitute for it. There might be examples showing the opposite but such are beyond my imagination. // Antonin Houska (Tony) SELECT tab1.i FROMtab1 -- The ph_may_need optimization should be effective for the -- top-level LEFT JOIN. The PHV in sub1 is only referenced below it. LEFT JOIN ( tab2 LEFT JOIN ( tab3 LEFT JOIN ( SELECT -- This expression should be wrapped in the PHV -- That PHV should have eval_at = {tab4, tab5}. -- -- Join clause of j1 is the highest reference to the PHV. -- Thus ph_may_need should be {tab2, tab3, tab4, tab5}. Therefore -- the ph_may_need optimization should avoid addition of eval_at -- to min_righthand of the top-level join's SpecialJoinInfo. COALESCE(tab4.l, tab5.m, 1) AS x FROM tab4 LEFT JOIN tab5 ON l = m ) AS sub1(x) ON tab3.k = sub1.x ) AS j2(k, x) ON tab2.j = j2.x ) AS j1(j, k) -- This clause references j.k (RHS of the lower join) to keep min_righthand -- of the top-level join bigger (ph_may_need needs to be its subset). ON tab1.i = j1.k; CREATE TABLE tab1(i int); CREATE TABLE tab2(j int); CREATE TABLE tab3(k int); CREATE TABLE tab4(l int); CREATE TABLE tab5(m int); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LATERAL quals revisited
On 07/03/2013 08:32 PM, Tom Lane wrote: Another possibility would be to keep the optimization, but disable it in queries that use LATERAL. I don't much care for that though --- seems too Rube Goldbergish, and in any case I have a lot less faith in the whole concept now than I had before I started digging into this issue. Thoughts? I noticed EXPLAIN in some regression tests. So if they all pass after removal of this optimization, it might indicate that it was really insignificant. But alternatively it may just be a lack of focus on this feature in the test queries. Digging for (non-LATERAL) queries or rather patterns where the ph_may_need optimization clearly appears to be important sounds to me like a good SQL exercise, but I'm afraid I won't have time for it in the next few days. //Antonin Houska (Tony) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LATERAL quals revisited
I wrote: So attached is a draft patch for this. It's not complete yet because there are various comments that are now wrong and need to be updated; but I think the code is functioning correctly. Hm, spoke too soon :-(. This query causes an assertion failure, with or without my draft patch: select c.*,a.*,ss1.q1,ss2.q1,ss3.* from int8_tbl c left join ( int8_tbl a left join (select q1, coalesce(q2,f1) as x from int8_tbl b, int4_tbl b2) ss1 on a.q2 = ss1.q1 cross join lateral (select q1, coalesce(ss1.x,q2) as y from int8_tbl d) ss2 ) on c.q2 = ss2.q1, lateral (select * from int4_tbl i where ss2.y f1) ss3; TRAP: FailedAssertion(!(bms_is_subset(phinfo-ph_needed, phinfo-ph_may_need)), File: initsplan.c, Line: 213) What's happening is that distribute_qual_to_rels concludes (correctly) that the ss2.y f1 clause must be postponed until after the nest of left joins, since those could null ss2.y. So the PlaceHolderVar for ss2.y is marked as being needed at the topmost join level. However, find_placeholders_in_jointree had only marked the PHV as being maybe needed to scan the i relation, since that's what the syntactic location of the reference implies. Since we depend on the assumption that ph_needed is always a subset of ph_may_need, there's an assertion that fires if that stops being true, and that's what's crashing. After some thought about this, I'm coming to the conclusion that lateral references destroy the ph_maybe_needed optimization altogether: we cannot derive an accurate estimate of where a placeholder will end up in the final qual distribution, short of essentially doing all the work in deconstruct_jointree over again. I guess in principle we could repeat deconstruct_jointree until we had stable estimates of the ph_needed locations, but that would be expensive and probably would induce a lot of new planner bugs (since the data structure changes performed during deconstruct_jointree aren't designed to be backed out easily). The only place where ph_may_need is actually used is in this bit in make_outerjoininfo(): /* * Examine PlaceHolderVars. If a PHV is supposed to be evaluated within * this join's nullable side, and it may get used above this join, then * ensure that min_righthand contains the full eval_at set of the PHV. * This ensures that the PHV actually can be evaluated within the RHS. * Note that this works only because we should already have determined the * final eval_at level for any PHV syntactically within this join. */ foreach(l, root-placeholder_list) { PlaceHolderInfo *phinfo = (PlaceHolderInfo *) lfirst(l); Relidsph_syn_level = phinfo-ph_var-phrels; /* Ignore placeholder if it didn't syntactically come from RHS */ if (!bms_is_subset(ph_syn_level, right_rels)) continue; /* We can also ignore it if it's certainly not used above this join */ /* XXX this test is probably overly conservative */ if (bms_is_subset(phinfo-ph_may_need, min_righthand)) continue; /* Else, prevent join from being formed before we eval the PHV */ min_righthand = bms_add_members(min_righthand, phinfo-ph_eval_at); } Looking at it again, it's not really clear that skipping placeholders in this way results in very much optimization --- sometimes we can avoid constraining join order, but how often? I tried diking out the check on ph_may_need from this loop, and saw no changes in the regression test results (not that that proves a whole lot about optimization of complex queries). So I'm pretty tempted to just remove ph_may_need, along with the machinery that computes it. Another possibility would be to keep the optimization, but disable it in queries that use LATERAL. I don't much care for that though --- seems too Rube Goldbergish, and in any case I have a lot less faith in the whole concept now than I had before I started digging into this issue. Thoughts? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LATERAL quals revisited
On 06/26/2013 12:52 AM, Tom Lane wrote: Instead of setting it aside, can we (mis)use placeholder var (PHV), to ensure that the WHERE clause is evaluated below the OJ; instead of combining it with the ON clause? No, that doesn't help; it has to be part of the joinquals at the join node, or you don't get the right execution semantics. When I wrote 'below the OJ' I meant to retain something of the original semantics (just like the subquery applies the WHERE clause below the OJ). However that's probably too restrictive and your next arguments Plus you could lose some optimization opportunities, for example if we fail to see that there's a strict join clause associated with the outer join (cf lhs_strict). Worse, I think wrapping a PHV around an otherwise indexable clause would prevent using it for an indexscan. also confirm the restrictiveness. So I can forget. One more concern anyway: doesn't your proposal make subquery pull-up a little bit risky in terms of cost of the resulting plan? IMO the subquery in the original query may filter out many rows and thus decrease the number of pairs to be evaluated by the join the ON clause belongs to. If the WHERE clause moves up, then the resulting plan might be less efficient than the one we'd get if the subquery hadn't been pulled-up at all. However at the time of cost evaluation there's no way to get back (not even to notice the higher cost) because the original subquery has gone at earlier stage of the planning. Regards, Antonin Houska (Tony) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LATERAL quals revisited
Antonin Houska antonin.hou...@gmail.com writes: If the WHERE clause moves up, then the resulting plan might be less efficient than the one we'd get if the subquery hadn't been pulled-up at all. No, because we can push the qual back down again (using a parameterized path) if that's appropriate. The problem at this stage is to understand the semantics of the outer join correctly, not to make a choice of what the plan will be. In fact, the reason we'd not noticed this bug before is exactly that all the test cases in the regression tests do end up pushing the qual back down. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] LATERAL quals revisited
I've been studying the bug reported at http://www.postgresql.org/message-id/20130617235236.GA1636@jeremyevans.local that the planner can do the wrong thing with queries like SELECT * FROM i LEFT JOIN LATERAL (SELECT * FROM j WHERE i.n = j.n) j ON true; I think the fundamental problem is that, because the i.n = j.n clause appears syntactically in WHERE, the planner is treating it as if it were an inner-join clause; but really it ought to be considered a clause of the upper LEFT JOIN. That is, semantically this query ought to be equivalent to SELECT * FROM i LEFT JOIN LATERAL (SELECT * FROM j) j ON i.n = j.n; However, because distribute_qual_to_rels doesn't see the clause as being attached to the outer join, it's not marked with the correct properties and ends up getting evaluated in the wrong place (as a filter clause not a join filter clause). The bug is masked in the test cases we've used so far because those cases are designed to let the clause get pushed down into the scan of the inner relation --- but if it doesn't get pushed down, it's evaluated the wrong way. After some contemplation, I think that the most practical way to fix this is for deconstruct_recurse and distribute_qual_to_rels to effectively move such a qual to the place where it logically belongs; that is, rather than processing it when we look at the lower WHERE clause, set it aside for a moment and then add it back when looking at the ON clause of the appropriate outer join. This should be reasonably easy to do by keeping a list of postponed lateral clauses while we're scanning the join tree. For there to *be* a unique appropriate outer join, we need to require that a LATERAL-using qual clause that's under an outer join contain lateral references only to the outer side of the nearest enclosing outer join. There's no such restriction in the spec of course, but we can make it so by refusing to flatten a sub-select if pulling it up would result in having a clause in the outer query that violates this rule. There's already some code in prepjointree.c (around line 1300) that attempts to enforce this, though now that I look at it again I'm not sure it's covering all the bases. We may need to extend that check. I'm inclined to process all LATERAL-using qual clauses this way, ie postpone them till we recurse back up to a place where they can logically be evaluated. That won't make any real difference when no outer joins are present, but it will eliminate the ugliness that right now distribute_qual_to_rels is prevented from sanity-checking the scope of the references in a qual when LATERAL is present. If we do it like this, we can resurrect full enforcement of that sanity check, and then throw an error if any postponed quals are left over when we're done recursing. Thoughts, better ideas? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LATERAL quals revisited
(Please excuse me if my proposal sounds silly, i'm still not too advanced in this area...) On 06/25/2013 10:00 PM, Tom Lane wrote: After some contemplation, I think that the most practical way to fix this is for deconstruct_recurse and distribute_qual_to_rels to effectively move such a qual to the place where it logically belongs; that is, rather than processing it when we look at the lower WHERE clause, set it aside for a moment and then add it back when looking at the ON clause of the appropriate outer join. This should be reasonably easy to do by keeping a list of postponed lateral clauses while we're scanning the join tree. Instead of setting it aside, can we (mis)use placeholder var (PHV), to ensure that the WHERE clause is evaluated below the OJ; instead of combining it with the ON clause? That would be a special PHV(s) in that it's not actually referenced from outside the subquery. Whether I'm right or not, I seem to have found another problem while trying to enforce such a PHV: postgres=# SELECT i.*, j.* FROM i LEFT JOIN LATERAL (SELECT COALESCE(i) FROM j WHERE (i.n = j.n)) j ON true; The connection to the server was lost. Attempting reset: Failed. TRAP: FailedAssertion(!(!bms_overlap(min_lefthand, min_righthand)), File: initsplan.c, Line: 1043) LOG: server process (PID 24938) was terminated by signal 6: Aborted DETAIL: Failed process was running: SELECT i.*, j.* FROM i LEFT JOIN LATERAL (SELECT COALESCE(i) FROM j WHERE (i.n = j.n)) j ON true; LOG: terminating any other active server processes WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. FATAL: the database system is in recovery mode I'm not able to judge right now whether the Assert() statement is the problem itself or anything else. You'll probably know better. (4f14c86d7434376b95477aeeb07fcc7272f4c47d is the last commit in my environment) Regards, Antonin Houska (Tony) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LATERAL quals revisited
Antonin Houska antonin.hou...@gmail.com writes: On 06/25/2013 10:00 PM, Tom Lane wrote: After some contemplation, I think that the most practical way to fix this is for deconstruct_recurse and distribute_qual_to_rels to effectively move such a qual to the place where it logically belongs; that is, rather than processing it when we look at the lower WHERE clause, set it aside for a moment and then add it back when looking at the ON clause of the appropriate outer join. Instead of setting it aside, can we (mis)use placeholder var (PHV), to ensure that the WHERE clause is evaluated below the OJ; instead of combining it with the ON clause? No, that doesn't help; it has to be part of the joinquals at the join node, or you don't get the right execution semantics. Plus you could lose some optimization opportunities, for example if we fail to see that there's a strict join clause associated with the outer join (cf lhs_strict). Worse, I think wrapping a PHV around an otherwise indexable clause would prevent using it for an indexscan. Whether I'm right or not, I seem to have found another problem while trying to enforce such a PHV: postgres=# SELECT i.*, j.* FROM i LEFT JOIN LATERAL (SELECT COALESCE(i) FROM j WHERE (i.n = j.n)) j ON true; The connection to the server was lost. Attempting reset: Failed. [ pokes at that ... ] Hm, right offhand this seems like an independent issue --- the ph_eval_at for the PHV is wrong AFAICS. Thanks for reporting it! regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LATERAL, UNNEST and spec compliance
On Thu, Jan 24, 2013 at 09:12:41PM -0800, David Fetter wrote: On Thu, Jan 24, 2013 at 09:51:46AM -0800, David Fetter wrote: Folks, Andrew Gierth asked me to send this out as his email is in a parlous state at the moment. My comments will follow in replies. Without further ado: [snip] As I see it, the current options are: 1. Do nothing, and insist on non-standard use of the LATERAL keyword. 2. Add UNNEST to the grammar (or parse analysis) as a special case, making it implicitly LATERAL. (This would make implementing S301 easier, but special cases are ugly.) 3. Make all cases of SRFs in the FROM-clause implicitly LATERAL. (As far as I can tell, those cases whose behaviour would be changed by this actually produce errors in versions prior to 9.3, so no working code should be affected.) Since LATERAL is new in 9.3, I think the pros and cons of these choices should be considered now, rather than being allowed to slide by unexamined. Please find attached a patch which implements approach 3. The vast majority of it is changes to the regression tests. The removed regression tests in join.{sql,out} are no longer errors, although some of them are pretty standard DoS attacks, hence they're all removed. Cheers, David. Oops. Misspelled rtekind in the previous patch. Here's a corrected one, much shorter. Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate *** a/src/backend/parser/gram.y --- b/src/backend/parser/gram.y *** *** 9391,9397 table_ref: relation_expr opt_alias_clause | func_table func_alias_clause { RangeFunction *n = makeNode(RangeFunction); ! n-lateral = false; n-funccallnode = $1; n-alias = linitial($2); n-coldeflist = lsecond($2); --- 9391,9397 | func_table func_alias_clause { RangeFunction *n = makeNode(RangeFunction); ! n-lateral = true; n-funccallnode = $1; n-alias = linitial($2); n-coldeflist = lsecond($2); *** a/src/backend/utils/adt/ruleutils.c --- b/src/backend/utils/adt/ruleutils.c *** *** 7928,7934 get_from_clause_item(Node *jtnode, Query *query, deparse_context *context) deparse_columns *colinfo = deparse_columns_fetch(varno, dpns); boolprintalias; ! if (rte-lateral) appendStringInfoString(buf, LATERAL ); /* Print the FROM item proper */ --- 7928,7934 deparse_columns *colinfo = deparse_columns_fetch(varno, dpns); boolprintalias; ! if (rte-lateral rte-rtekind != RTE_FUNCTION) appendStringInfoString(buf, LATERAL ); /* Print the FROM item proper */ *** a/src/test/regress/expected/join.out --- b/src/test/regress/expected/join.out *** *** 3577,3603 select * from Output: (COALESCE((COALESCE(b.q2, 42::bigint)), d.q2)) (26 rows) - -- test some error cases where LATERAL should have been used but wasn't - select f1,g from int4_tbl a, generate_series(0, f1) g; - ERROR: column f1 does not exist - LINE 1: select f1,g from int4_tbl a, generate_series(0, f1) g; - ^ - HINT: There is a column named f1 in table a, but it cannot be referenced from this part of the query. - select f1,g from int4_tbl a, generate_series(0, a.f1) g; - ERROR: invalid reference to FROM-clause entry for table a - LINE 1: select f1,g from int4_tbl a, generate_series(0, a.f1) g; - ^ - HINT: There is an entry for table a, but it cannot be referenced from this part of the query. - select f1,g from int4_tbl a cross join generate_series(0, f1) g; - ERROR: column f1 does not exist - LINE 1: ...ct f1,g from int4_tbl a cross join generate_series(0, f1) g; - ^ - HINT: There is a column named f1 in table a, but it cannot be referenced from this part of the query. - select f1,g from int4_tbl a cross join generate_series(0, a.f1) g; - ERROR: invalid reference to FROM-clause entry for table a - LINE 1: ... f1,g from
Re: [HACKERS] LATERAL, UNNEST and spec compliance
* David Fetter (da...@fetter.org) wrote: As I see it, the current options are: 1. Do nothing, and insist on non-standard use of the LATERAL keyword. I'm not a big fan of this. Providing a good error message saying you need to use LATERAL for this query to work makes it slightly better, but I don't feel like there's really any ambiguity here. 2. Add UNNEST to the grammar (or parse analysis) as a special case, making it implicitly LATERAL. (This would make implementing S301 easier, but special cases are ugly.) This I really don't like. 3. Make all cases of SRFs in the FROM-clause implicitly LATERAL. (As far as I can tell, those cases whose behaviour would be changed by this actually produce errors in versions prior to 9.3, so no working code should be affected.) +1 for me on this idea. If you're calling an SRF, passing in a lateral value, 'LATERAL' seems like it's just a noise word, and apparently the SQL authors felt the same, as they don't require it for unnest(). Since LATERAL is new in 9.3, I think the pros and cons of these choices should be considered now, rather than being allowed to slide by unexamined. I agree that we should really hammer this down before 9.3 is out the door. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] LATERAL, UNNEST and spec compliance
Stephen Frost sfr...@snowman.net writes: * David Fetter (da...@fetter.org) wrote: 3. Make all cases of SRFs in the FROM-clause implicitly LATERAL. (As far as I can tell, those cases whose behaviour would be changed by this actually produce errors in versions prior to 9.3, so no working code should be affected.) +1 for me on this idea. If you're calling an SRF, passing in a lateral value, 'LATERAL' seems like it's just a noise word, and apparently the SQL authors felt the same, as they don't require it for unnest(). At first I didn't like this idea, but it's growing on me. However ... David is wrong to claim that it's zero-risk. It's true that an SRF can't contain any side-references today, but it can contain an outer reference. Consider a case like SELECT ... FROM a WHERE a.x IN (SELECT ... FROM b, srf(y) WHERE ...) In existing releases the y could be a valid outer reference to a.y. If b also has a column y, David's proposal would cause us to prefer that interpretation, since b.y would be more closely nested than a.y. If you're lucky, you'd get a type-mismatch error, but if the two y's are of similar datatypes the query would just silently do something different than it used to. This is a little bit far-fetched, but it could happen. As against that, we make incompatible changes in every release, and it does seem like assuming LATERAL for functions in FROM would be a usability gain most of the time. And special-casing UNNEST to satisfy the standard seems *really* ugly. I agree that we should really hammer this down before 9.3 is out the door. Yeah, if we're going to do this it'd make the most sense to do it in the same release that introduces LATERAL. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LATERAL, UNNEST and spec compliance
* Tom Lane (t...@sss.pgh.pa.us) wrote: However ... David is wrong to claim that it's zero-risk. It's true that an SRF can't contain any side-references today, but it can contain an outer reference. Consider a case like SELECT ... FROM a WHERE a.x IN (SELECT ... FROM b, srf(y) WHERE ...) I see what you mean, but on the other hand, that looks like something we might actually want to complain about as 'y' is pretty clearly ambiguous here. I'm a bit surprised that doesn't already throw an error. This is a little bit far-fetched, but it could happen. As against that, we make incompatible changes in every release, and it does seem like assuming LATERAL for functions in FROM would be a usability gain most of the time. And special-casing UNNEST to satisfy the standard seems *really* ugly. It's definitely far-fetched, imv. If it's possible, within reason, to explicitly throw a please disambiguate 'y' type of error in those specific cases, that'd be nice, but I don't think it'd be required. A mention in the release notes would be sufficient. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] LATERAL, UNNEST and spec compliance
* Tom Lane (t...@sss.pgh.pa.us) wrote: SELECT ... FROM a WHERE a.x IN (SELECT ... FROM b, srf(y) WHERE ...) Actually, this appears to fail already, at least in 9.2.2: = select * from (values (1)) v(a) where v.a in (select x from (values (2)) v2(a), - generate_series(1,a) x); ERROR: function expression in FROM cannot refer to other relations of same query level LINE 2: generate_series(1,a) x); ^ Unless it's something else that you were referring to...? Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] LATERAL, UNNEST and spec compliance
Stephen Frost sfr...@snowman.net writes: * Tom Lane (t...@sss.pgh.pa.us) wrote: SELECT ... FROM a WHERE a.x IN (SELECT ... FROM b, srf(y) WHERE ...) Actually, this appears to fail already, at least in 9.2.2: = select * from (values (1)) v(a) where v.a in (select x from (values (2)) v2(a), - generate_series(1,a) x); ERROR: function expression in FROM cannot refer to other relations of same query level LINE 2: generate_series(1,a) x); ^ Huh ... you're right, I'd forgotten about that. That's an ancient bug that got fixed in passing in the LATERAL work. So, as long as we're not going to fix that bug in the back branches (which would be difficult anyway IIRC), we don't have a compatibility problem ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LATERAL, UNNEST and spec compliance
David Fetter da...@fetter.org writes: Please find attached a patch which implements approach 3. The vast majority of it is changes to the regression tests. The removed regression tests in join.{sql,out} are no longer errors, although some of them are pretty standard DoS attacks, hence they're all removed. Here's a less quick-hack-y approach to that. regards, tom lane diff --git a/doc/src/sgml/queries.sgml b/doc/src/sgml/queries.sgml index bcee9468240e8a10b8e491a8f1ab8a1e2c5d9ede..caa9f1b3389e5ce57e2e50d13011e41c0ed3d11b 100644 *** a/doc/src/sgml/queries.sgml --- b/doc/src/sgml/queries.sgml *** SELECT * *** 717,730 /indexterm para ! Subqueries and table functions appearing in literalFROM/ can be preceded by the key word literalLATERAL/. This allows them to reference columns provided by preceding literalFROM/ items. ! (Without literalLATERAL/literal, each literalFROM/ item is evaluated independently and so cannot cross-reference any other literalFROM/ item.) A literalLATERAL/literal item can appear at top level in the ! literalFROM/ list, or within a literalJOIN/ tree; in the latter case it can also refer to any items that are on the left-hand side of a literalJOIN/ that it is on the right-hand side of. /para --- 717,740 /indexterm para ! Subqueries appearing in literalFROM/ can be preceded by the key word literalLATERAL/. This allows them to reference columns provided by preceding literalFROM/ items. ! (Without literalLATERAL/literal, each subquery is evaluated independently and so cannot cross-reference any other literalFROM/ item.) + /para + + para + Table functions appearing in literalFROM/ can also be + preceded by the key word literalLATERAL/, but for functions the + key word is optional; the function's arguments can contain references + to columns provided by preceding literalFROM/ items in any case. + /para + + para A literalLATERAL/literal item can appear at top level in the ! literalFROM/ list, or within a literalJOIN/ tree. In the latter case it can also refer to any items that are on the left-hand side of a literalJOIN/ that it is on the right-hand side of. /para *** FROM polygons p1 CROSS JOIN LATERAL vert *** 770,776 polygons p2 CROSS JOIN LATERAL vertices(p2.poly) v2 WHERE (v1 lt;-gt; v2) lt; 10 AND p1.id != p2.id; /programlisting ! or in several other equivalent formulations. /para para --- 780,788 polygons p2 CROSS JOIN LATERAL vertices(p2.poly) v2 WHERE (v1 lt;-gt; v2) lt; 10 AND p1.id != p2.id; /programlisting ! or in several other equivalent formulations. (As already mentioned, ! the literalLATERAL/ key word is unnecessary in this example, but ! we use it for clarity.) /para para diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml index 26d511fad8c5b8d02bda618006ce2606036db7c7..0f9d52753d832fa458aca563fa2bfcf558120818 100644 *** a/doc/src/sgml/ref/select.sgml --- b/doc/src/sgml/ref/select.sgml *** TABLE [ ONLY ] replaceable class=param *** 504,521 varlistentry termliteralLATERAL/literal/term listitem !paraThe literalLATERAL/literal key word can precede a ! sub-commandSELECT/command or function-call literalFROM/ ! item. This allows the sub-commandSELECT/command or function ! expression to refer to columns of literalFROM/ items that appear ! before it in the literalFROM/ list. (Without ! literalLATERAL/literal, each literalFROM/ item is evaluated ! independently and so cannot cross-reference any other ! literalFROM/ item.) A literalLATERAL/literal item can ! appear at top level in the literalFROM/ list, or within a ! literalJOIN/ tree; in the latter case it can also refer to any ! items that are on the left-hand side of a literalJOIN/ that it is ! on the right-hand side of. /para para --- 504,531 varlistentry termliteralLATERAL/literal/term listitem !para ! The literalLATERAL/literal key word can precede a ! sub-commandSELECT/command literalFROM/ item. This allows the ! sub-commandSELECT/command to refer to columns of literalFROM/ ! items that appear before it in the literalFROM/ list. (Without ! literalLATERAL/literal, each sub-commandSELECT/command is ! evaluated independently and so cannot cross-reference any other ! literalFROM/ item.) !/para ! !para ! literalLATERAL/literal can also precede a function-call ! literalFROM/ item, but in this case it is a noise word, because ! the function expression can refer to earlier
[HACKERS] LATERAL, UNNEST and spec compliance
Folks, Andrew Gierth asked me to send this out as his email is in a parlous state at the moment. My comments will follow in replies. Without further ado: SQL2008 says, for 7.6 table reference 6) a) If TR is contained in a from clause FC with no intervening query expression, then the scope clause SC of TR is the select statement: single row or innermost query specification that contains FC. The scope of a range variable of TR is the select list, where clause, group by clause, having clause, and window clause of SC, together with every lateral derived table that is simply contained in FC and is preceded by TR, and every collection derived table that is simply contained in FC and is preceded by TR, and the join condition of all joined tables contained in SC that contain TR. If SC is the query specification that is the query expression body of a simple table query STQ, then the scope of a range variable of TR also includes the order by clause of STQ. This is the clause that defines the scope effect of LATERAL, and as can be seen, it defines collection derived table, i.e. UNNEST(), as having the same behaviour as lateral derived table. It is also worth noting at this point that pg's FROM func() syntax is not in the spec (the nearest is FROM TABLE(collection value expression)). Our implementation of UNNEST currently deviates from the spec by not being implicitly LATERAL; given the (sub)query SELECT * FROM sometable, UNNEST(somearray); then somearray is required to be a parameter or outer reference rather than a column of sometable. To get the spec's behaviour for this, we currently have to do: SELECT * FROM sometable, LATERAL UNNEST(somearray); which is non-standard syntax. (In the spec, only table subquery can follow LATERAL.) (We also don't accept the (optional) syntax of S301, allowing multiple parameters to UNNEST().) As I see it, the current options are: 1. Do nothing, and insist on non-standard use of the LATERAL keyword. 2. Add UNNEST to the grammar (or parse analysis) as a special case, making it implicitly LATERAL. (This would make implementing S301 easier, but special cases are ugly.) 3. Make all cases of SRFs in the FROM-clause implicitly LATERAL. (As far as I can tell, those cases whose behaviour would be changed by this actually produce errors in versions prior to 9.3, so no working code should be affected.) Since LATERAL is new in 9.3, I think the pros and cons of these choices should be considered now, rather than being allowed to slide by unexamined. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] LATERAL and VOLATILE functions
Hello I tested some usage of LATERAL clause, and I found so LATERAL doesn't respects difference between VOLATILE and IMMUTABLE functions. Is this behave expected? -- unexpected postgres=# select * from generate_series(1,3) g(v), LATERAL (SELECT random()) x; ; v │ random ───┼── 1 │ 0.63025646051392 2 │ 0.63025646051392 3 │ 0.63025646051392 (3 rows) -- expected postgres=# select * from generate_series(1,3) g(v), LATERAL (SELECT random() - v + v) x; v │ ?column? ───┼─── 1 │ 0.381548477802426 2 │ 0.762988060247153 3 │ 0.181648664642125 (3 rows) Regards Pavel Stehule -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LATERAL and VOLATILE functions
Pavel Stehule pavel.steh...@gmail.com writes: Is this behave expected? -- unexpected postgres=# select * from generate_series(1,3) g(v), LATERAL (SELECT random()) x; ; vrandom ---+-- 1 0.63025646051392 2 0.63025646051392 3 0.63025646051392 (3 rows) The LATERAL keyword is a no-op since x doesn't contain any side-reference to g(v). So you get a plain join between g and a single-row relation x. If the SQL standard actually specified what LATERAL means, we could argue about whether that's a correct interpretation or not. I haven't been able to find anyplace where the spec defines the semantics though. And I'm fairly certain that we *don't* want it to mean recompute for every row generated to the left of the keyword, whether there is a variable reference or not. Consider for example select ... from a, b, c join lateral d on ... If the D item only contains references to C, it's unlikely that the programmer wants it to be re-evaluated again for each possible row in A*B. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LATERAL and VOLATILE functions
2012/12/15 Tom Lane t...@sss.pgh.pa.us: Pavel Stehule pavel.steh...@gmail.com writes: Is this behave expected? -- unexpected postgres=# select * from generate_series(1,3) g(v), LATERAL (SELECT random()) x; ; vrandom ---+-- 1 0.63025646051392 2 0.63025646051392 3 0.63025646051392 (3 rows) The LATERAL keyword is a no-op since x doesn't contain any side-reference to g(v). So you get a plain join between g and a single-row relation x. If the SQL standard actually specified what LATERAL means, we could argue about whether that's a correct interpretation or not. I haven't been able to find anyplace where the spec defines the semantics though. And I'm fairly certain that we *don't* want it to mean recompute for every row generated to the left of the keyword, whether there is a variable reference or not. Consider for example select ... from a, b, c join lateral d on ... If the D item only contains references to C, it's unlikely that the programmer wants it to be re-evaluated again for each possible row in A*B. Stable and immutable functions should be recalculated once time, but for volatile functions is recalculation probably more natural (expected). Every time is strange, when function random() returns same numbers. I am not sure if this behave can be problem in real usage - probably it can be a surprise for someone who use random() for some testing. Regards Pavel regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] lateral function as a subquery - WIP patch
Robert Haas robertmh...@gmail.com writes: On Fri, Mar 9, 2012 at 8:15 PM, Tom Lane t...@sss.pgh.pa.us wrote: Um ... how do you get the subquery result rows to join to only the correct rows of the other tables? This looks like an unconstrained join to me, which is not what I believe the SQL spec for LATERAL to be, and it doesn't seem especially useful either. (If a subquery could do what people wanted, we'd not be hearing all the requests for LATERAL.) I think LATERAL is intended as more or less an unconstrained nested loop with the lateral expression on the inner side, parameterized by value from the outer side. Typically it's a SRF. Um ... if it's parameterized by values from a current row of the outer side, then it's not an unconstrained join. That would be like doing an inner indexscan join and producing a cross-join result. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] lateral function as a subquery - WIP patch
On Sat, Mar 10, 2012 at 4:16 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Fri, Mar 9, 2012 at 8:15 PM, Tom Lane t...@sss.pgh.pa.us wrote: Um ... how do you get the subquery result rows to join to only the correct rows of the other tables? This looks like an unconstrained join to me, which is not what I believe the SQL spec for LATERAL to be, and it doesn't seem especially useful either. (If a subquery could do what people wanted, we'd not be hearing all the requests for LATERAL.) I think LATERAL is intended as more or less an unconstrained nested loop with the lateral expression on the inner side, parameterized by value from the outer side. Typically it's a SRF. Um ... if it's parameterized by values from a current row of the outer side, then it's not an unconstrained join. That would be like doing an inner indexscan join and producing a cross-join result. True. I just meant that no join filter was implied. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] lateral function as a subquery - WIP patch
On 03/10/2012 02:15 AM, Tom Lane wrote: Um ... how do you get the subquery result rows to join to only the correct rows of the other tables? The subquery just restricts the set of rows that the function has to evaluate. The main query is supposed to perform the join. I understand, such a join causes repeated scan of the function if the function is on the inner side. This looks like an unconstrained join to me, which is not what I believe the SQL spec for LATERAL to be, o.k., then just forget about my proposal. Thanks for your comments anyway, Tony H. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] lateral function as a subquery - WIP patch
Hello, in the related discussions mentioned on TODO list http://archives.postgresql.org/pgsql-hackers/2009-09/msg00292.php http://archives.postgresql.org/pgsql-hackers/2009-10/msg00991.php (The 1st is rather on SQL, I didn't focuss on it yet.) the implementation is discussed from optimizer/executor's point of view. I'm wondering why not to address the problem at earlier stage: rewrite the range function to a subquery. For example: SELECT * FROM a, b, func(a.i, b.j) as c, d WHERE a.i=b.j and b.j = d.k and c1 may become SELECT * FROM a, b, subquery as c, d WHERE a.i=b.j and b.j = d.k and c1 where subquery is SELECT func(a.i, b.j) FROM a,b WHERE a.i=b.j The WHERE clause of the original query is considered a list of ANDed subclauses. Given 'rt_index' is range table index of the function, only those subclauses are used in the substitution subquery having RT index lower than 'rt_index'. Even with such a partial qualification the subquery can safely exclude (from function calls) rows that the main query won't need anyway. Note that 1. This is rather an alternative to the optimizer/executor focused approach that the past discussions covered. I'm aware of questions about SQL conformance. 2. I only propose this for functions, not for general queries. 3. This draft does not deal with record-returning functions (Although I might have some idea how to treat them.). Is there any obvious reason not to go this way? Attached is my (experimental) implementation. Kind regards, Tony. diff --git a/src/backend/parser/parse_clause.c b/src/backend/parser/parse_clause.c index 7f4da92..ac4f620 100644 --- a/src/backend/parser/parse_clause.c +++ b/src/backend/parser/parse_clause.c @@ -563,25 +563,6 @@ transformRangeFunction(ParseState *pstate, RangeFunction *r) assign_expr_collations(pstate, funcexpr); /* - * The function parameters cannot make use of any variables from other - * FROM items. (Compare to transformRangeSubselect(); the coding is - * different though because we didn't parse as a sub-select with its own - * level of namespace.) - * - * XXX this will need further work to support SQL99's LATERAL() feature, - * wherein such references would indeed be legal. - */ - if (pstate-p_relnamespace || pstate-p_varnamespace) - { - if (contain_vars_of_level(funcexpr, 0)) - ereport(ERROR, - (errcode(ERRCODE_INVALID_COLUMN_REFERENCE), - errmsg(function expression in FROM cannot refer to other relations of same query level), - parser_errposition(pstate, - locate_var_of_level(funcexpr, 0; - } - - /* * Disallow aggregate functions in the expression. (No reason to postpone * this check until parseCheckAggregates.) */ diff --git a/src/backend/rewrite/rewriteHandler.c b/src/backend/rewrite/rewriteHandler.c index 04f9622..b7c93c6 100644 --- a/src/backend/rewrite/rewriteHandler.c +++ b/src/backend/rewrite/rewriteHandler.c @@ -18,6 +18,8 @@ #include commands/trigger.h #include nodes/makefuncs.h #include nodes/nodeFuncs.h +#include optimizer/clauses.h +#include optimizer/var.h #include parser/analyze.h #include parser/parse_coerce.h #include parser/parsetree.h @@ -60,6 +62,7 @@ static List *matchLocks(CmdType event, RuleLock *rulelocks, int varno, Query *parsetree); static Query *fireRIRrules(Query *parsetree, List *activeRIRs, bool forUpdatePushedDown); +static void substituteRTEFunction(Query *parsetree, RangeTblEntry *rte, int rt_index); /* @@ -1557,6 +1560,14 @@ fireRIRrules(Query *parsetree, List *activeRIRs, bool forUpdatePushedDown) continue; } + if (rte-rtekind == RTE_FUNCTION) + { + + substituteRTEFunction(parsetree, rte, rt_index); + continue; + + } + /* * Joins and other non-relation RTEs can be ignored completely. */ @@ -2239,3 +2250,192 @@ QueryRewrite(Query *parsetree) return results; } + +static void +substituteRTEFunction(Query *parsetree, RangeTblEntry *rte, int rt_index) { + bool is_lateral = false; + FuncExpr *funcExpr; + ListCell *lcell; + + if (!IsA(rte-funcexpr, FuncExpr)) { + elog(ERROR, unrecognized range function type: %d, (int) nodeTag(rte-funcexpr)); + } + + funcExpr = (FuncExpr *) rte-funcexpr; + + if (!funcExpr-args) { + return; + } + + foreach(lcell, funcExpr-args) { + Node *arg = (Node *) lfirst(lcell); + + is_lateral = contain_vars_of_level(arg, 0); + if (is_lateral) { + break; + } + } + + + if (is_lateral) { + List* from; + List *rt_sub = NIL; + Relids rt_sub_ids = NULL; + Query *subquery = makeNode(Query); + List *subquery_from = NIL; + TargetEntry *subquery_te = makeTargetEntry((Expr *) funcExpr, 1, NULL, false); + Node *subquery_quals = NULL; + int i; + + + Assert(parsetree-jointree != NULL IsA(parsetree-jointree, FromExpr)); + + + from = parsetree-jointree-fromlist; + + /* + * Construct FROM list of the substitution subquery. + */ + foreach (lcell, from) { + Node *fnode = lfirst(lcell); + int rt_ind_f = 0; + + if (IsA(fnode,
Re: [HACKERS] lateral function as a subquery - WIP patch
Antonin Houska antonin.hou...@gmail.com writes: For example: SELECT * FROM a, b, func(a.i, b.j) as c, d WHERE a.i=b.j and b.j = d.k and c1 may become SELECT * FROM a, b, subquery as c, d WHERE a.i=b.j and b.j = d.k and c1 where subquery is SELECT func(a.i, b.j) FROM a,b WHERE a.i=b.j Um ... how do you get the subquery result rows to join to only the correct rows of the other tables? This looks like an unconstrained join to me, which is not what I believe the SQL spec for LATERAL to be, and it doesn't seem especially useful either. (If a subquery could do what people wanted, we'd not be hearing all the requests for LATERAL.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] lateral function as a subquery - WIP patch
On Fri, Mar 9, 2012 at 8:15 PM, Tom Lane t...@sss.pgh.pa.us wrote: Antonin Houska antonin.hou...@gmail.com writes: For example: SELECT * FROM a, b, func(a.i, b.j) as c, d WHERE a.i=b.j and b.j = d.k and c1 may become SELECT * FROM a, b, subquery as c, d WHERE a.i=b.j and b.j = d.k and c1 where subquery is SELECT func(a.i, b.j) FROM a,b WHERE a.i=b.j Um ... how do you get the subquery result rows to join to only the correct rows of the other tables? This looks like an unconstrained join to me, which is not what I believe the SQL spec for LATERAL to be, and it doesn't seem especially useful either. (If a subquery could do what people wanted, we'd not be hearing all the requests for LATERAL.) I think LATERAL is intended as more or less an unconstrained nested loop with the lateral expression on the inner side, parameterized by value from the outer side. Typically it's a SRF. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LATERAL
Robert Haas wrote: On Sat, Dec 19, 2009 at 11:01 PM, Robert Haas robertmh...@gmail.com wrote: On Sat, Dec 19, 2009 at 4:46 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Sat, Dec 19, 2009 at 3:01 PM, Tom Lane t...@sss.pgh.pa.us wrote: I believe the correct approach is probably to treat values that need to be propagated into the inner side as executor parameters. ?This could replace the existing, rather crocky, management of values passed into a nestloop inner indexscan. What is the best place to look for the existing, rather crocky code? Follow the second argument of ExecReScan from nodeNestloop to nodeIndexscan. Yeah, this is grotty. ?It appears that the comment introducing ExecReScan() is somewhat incorrect. ?It asserts that exprCtxt is used only Sigh. ...is used only for index scans. However, it's actually also used for bitmap scans (both heap and index) and TID scans. Also, there appears to be an effort by nodes that don't use exprCtxt directly to propagate down through the node tree, which doesn't seem to make much sense if this is only intended to be used on the inner side of a nestloop. Does some comment need to be updated? -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LATERAL
2009/10/20 Andrew Gierth and...@tao11.riddles.org.uk: Right now, the only way pg can plan this is to do a hashjoin or mergejoin of the _entire content of big1 and big2_ and join the result against small (again in a hashjoin or mergejoin plan). This becomes excessively slow compared to the ideal plan: nested loop seqscan on small nested loop indexscan on big1 where id=small.id indexscan on big2 where id=small.id (or big1.id which is equiv) (The same argument applies if small is not actually small but has restriction clauses) I have a similar issue on my mind, but is this the same as the topic? SELECT ... FROM small INNER JOIN (SELECT ... FROM large GROUP BY large.id) agged ON small.id = agged.id WHERE small.id IN (bla bla bla) The ideal plan is SeqScan on small with filtering sub query aggregate on large by small.id but the actual plan is full aggregate on large since the planner doesn't push down outer qual to aggregate node. The output will discard almost all of agged's output. Regards, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LATERAL
On Sat, Dec 19, 2009 at 12:49 PM, Hitoshi Harada umi.tan...@gmail.com wrote: 2009/10/20 Andrew Gierth and...@tao11.riddles.org.uk: Right now, the only way pg can plan this is to do a hashjoin or mergejoin of the _entire content of big1 and big2_ and join the result against small (again in a hashjoin or mergejoin plan). This becomes excessively slow compared to the ideal plan: nested loop seqscan on small nested loop indexscan on big1 where id=small.id indexscan on big2 where id=small.id (or big1.id which is equiv) (The same argument applies if small is not actually small but has restriction clauses) I have a similar issue on my mind, but is this the same as the topic? SELECT ... FROM small INNER JOIN (SELECT ... FROM large GROUP BY large.id) agged ON small.id = agged.id WHERE small.id IN (bla bla bla) The ideal plan is SeqScan on small with filtering sub query aggregate on large by small.id but the actual plan is full aggregate on large since the planner doesn't push down outer qual to aggregate node. The output will discard almost all of agged's output. I just tried this and it works for me. create table foo (id serial, name varchar, primary key (id)); create table bar (id serial, foo_id integer references foo (id), name varchar, primary key (id)); insert into foo (name) select random()::varchar from generate_series(1,1000); insert into bar (foo_id, name) select (g%10)+1, random()::varchar from generate_series(1,1) g; explain select * from foo inner join (select foo_id, sum(1) from bar group by 1) x on foo.id = x.foo_id where x.foo_id = 1; ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LATERAL
2009/12/20 Robert Haas robertmh...@gmail.com: On Sat, Dec 19, 2009 at 12:49 PM, Hitoshi Harada umi.tan...@gmail.com wrote: 2009/10/20 Andrew Gierth and...@tao11.riddles.org.uk: Right now, the only way pg can plan this is to do a hashjoin or mergejoin of the _entire content of big1 and big2_ and join the result against small (again in a hashjoin or mergejoin plan). This becomes excessively slow compared to the ideal plan: nested loop seqscan on small nested loop indexscan on big1 where id=small.id indexscan on big2 where id=small.id (or big1.id which is equiv) (The same argument applies if small is not actually small but has restriction clauses) I have a similar issue on my mind, but is this the same as the topic? SELECT ... FROM small INNER JOIN (SELECT ... FROM large GROUP BY large.id) agged ON small.id = agged.id WHERE small.id IN (bla bla bla) The ideal plan is SeqScan on small with filtering sub query aggregate on large by small.id but the actual plan is full aggregate on large since the planner doesn't push down outer qual to aggregate node. The output will discard almost all of agged's output. I just tried this and it works for me. create table foo (id serial, name varchar, primary key (id)); create table bar (id serial, foo_id integer references foo (id), name varchar, primary key (id)); insert into foo (name) select random()::varchar from generate_series(1,1000); insert into bar (foo_id, name) select (g%10)+1, random()::varchar from generate_series(1,1) g; explain select * from foo inner join (select foo_id, sum(1) from bar group by 1) x on foo.id = x.foo_id where x.foo_id = 1; ...Robert Ah your example works for me, too. My issue is: explain select * from foo inner join (select foo_id, sum(1) from bar group by 1) x on foo.id = x.foo_id where foo.id = 1; where foo.id = 1 (not where x.foo_id = 1). And I now figured out it's another problem. Regards, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LATERAL
On Thu, Dec 17, 2009 at 10:13 PM, Robert Haas robertmh...@gmail.com wrote: Another question I have - while generalizing the inner-indexscan machinery is an interesting join optimization technique, I'm thinking that it actually has very little to do with LATERAL. Is there any reason to suppose that one or the other needs to be done first? And the winner is... yes. Or at least, I think so. One of the major reasons why people want LATERAL() is for SRFs, but currently, even if you beat the code into allowing a SRF with an outer reference, the planner can easily be persuaded to run the SRF on the outer side of a join with the dependency as the inner side, which ain't gonna work. (Even you jigger the query so that the planner gets them on the correct sides of the join, the executor fails, but that's a different problem.) The idea Tom came up with back in October is to allow paths to be tagged with a set of rels to which they must in the future be joined in order for the path to be allowable. The point of that exercise was to generalize the current inner-indexscan machinery so that we can create that type of plan in match_unsorted_outer() even when the inner side is a joinrel. But, it strikes me that what we need to allow a function scan with an outer reference is remarkably similar - the function scan can only be used as the inner side of a nestloop with a certain set of rels on the outer side. On the other hand, it's not exactly the same, either. In the case of a construct like A LJ (B IJ C), partial-index scan paths for B and C will require a subsequent nest-join to A to become fully valid, but there will also be other paths that don't. But for something like A, LATERAL (some_srf(A.x)), the ONLY path for the rel defined by some_srf(A.x) has a future-join requirement of {A}. It's not clear to me whether there's anything useful that can be done with this knowledge. Incidentally, the reason why the executor chokes trying to execute a SRF with an outer reference is because ExecEvalVar() craps out trying to dereference a null TupleTableSlot. If I'm understanding this correctly, that, in turn, happens because the variable that we're trying to deference is marked as neither INNER nor OUTER, so it's assumed to be from a scan, but there's no scan node. Going even further from my area of actually understanding what's going on, I think this needs to be fixed by adjusting setrefs.c. Allowing LATERAL(), or for that matter the generalized inner-index scan stuff, will I think mean that set_inner_join_references() will need to handle a lot more cases than it current does. I don't understand this code well enough to begin to speculate as to what should happen here. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LATERAL
Robert Haas robertmh...@gmail.com writes: Incidentally, the reason why the executor chokes trying to execute a SRF with an outer reference is because ExecEvalVar() craps out trying to dereference a null TupleTableSlot. If I'm understanding this correctly, that, in turn, happens because the variable that we're trying to deference is marked as neither INNER nor OUTER, so it's assumed to be from a scan, but there's no scan node. Going even further from my area of actually understanding what's going on, I think this needs to be fixed by adjusting setrefs.c. Well, no: we can't handle such references as OUTER vars because the OUTER slot is likely to be in use already in the sub-join. It would be even messier if you wanted several references to different outer relations. I believe the correct approach is probably to treat values that need to be propagated into the inner side as executor parameters. This could replace the existing, rather crocky, management of values passed into a nestloop inner indexscan. The mechanisms that deal with forcing rescans of subplans affected by a changed parameter value would be very helpful here too. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LATERAL
On Sat, Dec 19, 2009 at 3:01 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: Incidentally, the reason why the executor chokes trying to execute a SRF with an outer reference is because ExecEvalVar() craps out trying to dereference a null TupleTableSlot. If I'm understanding this correctly, that, in turn, happens because the variable that we're trying to deference is marked as neither INNER nor OUTER, so it's assumed to be from a scan, but there's no scan node. Going even further from my area of actually understanding what's going on, I think this needs to be fixed by adjusting setrefs.c. Well, no: we can't handle such references as OUTER vars because the OUTER slot is likely to be in use already in the sub-join. It would be even messier if you wanted several references to different outer relations. Oh. Yeah. I believe the correct approach is probably to treat values that need to be propagated into the inner side as executor parameters. This could replace the existing, rather crocky, management of values passed into a nestloop inner indexscan. The mechanisms that deal with forcing rescans of subplans affected by a changed parameter value would be very helpful here too. What is the best place to look for the existing, rather crocky code? I have to admit that the whole mechanism by which paths get transformed into plans and handed off to the executor is still rather opaque to me. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LATERAL
Robert Haas robertmh...@gmail.com writes: On Sat, Dec 19, 2009 at 3:01 PM, Tom Lane t...@sss.pgh.pa.us wrote: I believe the correct approach is probably to treat values that need to be propagated into the inner side as executor parameters. This could replace the existing, rather crocky, management of values passed into a nestloop inner indexscan. What is the best place to look for the existing, rather crocky code? Follow the second argument of ExecReScan from nodeNestloop to nodeIndexscan. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LATERAL
On Sat, Dec 19, 2009 at 4:46 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Sat, Dec 19, 2009 at 3:01 PM, Tom Lane t...@sss.pgh.pa.us wrote: I believe the correct approach is probably to treat values that need to be propagated into the inner side as executor parameters. This could replace the existing, rather crocky, management of values passed into a nestloop inner indexscan. What is the best place to look for the existing, rather crocky code? Follow the second argument of ExecReScan from nodeNestloop to nodeIndexscan. Yeah, this is grotty. It appears that the comment introducing ExecReScan() is somewhat incorrect. It asserts that exprCtxt is used only -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LATERAL
On Sat, Dec 19, 2009 at 11:01 PM, Robert Haas robertmh...@gmail.com wrote: On Sat, Dec 19, 2009 at 4:46 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Sat, Dec 19, 2009 at 3:01 PM, Tom Lane t...@sss.pgh.pa.us wrote: I believe the correct approach is probably to treat values that need to be propagated into the inner side as executor parameters. This could replace the existing, rather crocky, management of values passed into a nestloop inner indexscan. What is the best place to look for the existing, rather crocky code? Follow the second argument of ExecReScan from nodeNestloop to nodeIndexscan. Yeah, this is grotty. It appears that the comment introducing ExecReScan() is somewhat incorrect. It asserts that exprCtxt is used only Sigh. ...is used only for index scans. However, it's actually also used for bitmap scans (both heap and index) and TID scans. Also, there appears to be an effort by nodes that don't use exprCtxt directly to propagate down through the node tree, which doesn't seem to make much sense if this is only intended to be used on the inner side of a nestloop. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LATERAL
Robert Haas robertmh...@gmail.com writes: Yeah, this is grotty. It appears that the comment introducing ExecReScan() is somewhat incorrect. It asserts that exprCtxt is used only Sigh. ...is used only for index scans. However, it's actually also used for bitmap scans (both heap and index) and TID scans. Yeah, the comment was probably correct when written. Also, there appears to be an effort by nodes that don't use exprCtxt directly to propagate down through the node tree, which doesn't seem to make much sense if this is only intended to be used on the inner side of a nestloop. That's just dead code, which is a good thing because the coverage is pretty incomplete. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LATERAL
On Sun, Oct 18, 2009 at 2:57 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: You could probably convince me that a merge join is not going to be too useful (how often can you want a merge join on the inner side of a nested loop? Why not? As Andrew pointed out, what we're really trying to accomplish here is consider sub-join plans that are parameterized by a value obtained from an outer relation. I think we shouldn't artificially limit what we consider. But anyway I think we're on the same page here: what we ought to do is try implementing this scheme without any extra restrictions on what it considers, and see what the performance is like. We can try to limit what it considers if it turns out not to work well in the simplest form. You mention what we ought to do here, so - is this something that you're planning to have a go at? Another question I have - while generalizing the inner-indexscan machinery is an interesting join optimization technique, I'm thinking that it actually has very little to do with LATERAL. Is there any reason to suppose that one or the other needs to be done first? ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LATERAL
On Sun, Oct 18, 2009 at 12:57 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: You could probably convince me that a merge join is not going to be too useful (how often can you want a merge join on the inner side of a nested loop? Why not? As Andrew pointed out, what we're really trying to accomplish here is consider sub-join plans that are parameterized by a value obtained from an outer relation. I think we shouldn't artificially limit what we consider. Am I understanding you right that a typical case of this might be something like nested loop index scan expecting 1 record merge join index scan on partial index where col = outer.foo and col2 between a and b some other scan or nested loop index scan expecting 1 record merge join index scan on col1,col2 where col1 = outer.foo and col2 between a and b some other scan Ie, where the nested loop is a degenerate nested loop which only expects a single value and provides a parameter which allows some partial index to work or allows for some other index scan by providing a higher order key element? -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LATERAL
Greg Stark gsst...@mit.edu writes: nested loop index scan expecting 1 record merge join index scan on col1,col2 where col1 = outer.foo and col2 between a and b some other scan Ie, where the nested loop is a degenerate nested loop which only expects a single value and provides a parameter which allows some partial index to work or allows for some other index scan by providing a higher order key element? Right. I don't see any particular reason to assume the inner path is iterated only once, either. If the key value coming from the outer path is sufficiently useful, this could be a win even with multiple iterations, as compared to having to scan the whole of some large relation or other ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LATERAL
Greg == Greg Stark gsst...@mit.edu writes: Why not? As Andrew pointed out, what we're really trying to accomplish here is consider sub-join plans that are parameterized by a value obtained from an outer relation. I think we shouldn't artificially limit what we consider. Greg Am I understanding you right that a typical case of this might Greg be something like Greg nested loop Greg index scan expecting 1 record Greg merge join Greg index scan on partial index where col = outer.foo and col2 Greg between a and b Greg some other scan no, because you could never pick the partial index at plan time. Greg or Greg nested loop Greg index scan expecting 1 record Greg merge join Greg index scan on col1,col2 where col1 = outer.foo and col2 Greg between a and b Greg some other scan Greg Ie, where the nested loop is a degenerate nested loop which Greg only expects a single value and provides a parameter which Greg allows some partial index to work or allows for some other Greg index scan by providing a higher order key element? The nested loop does NOT have to be degenerate. Consider queries of this form: select * from small left join (big1 join big2 on (big1.id=big2.id)) on (small.id=big1.id); Right now, the only way pg can plan this is to do a hashjoin or mergejoin of the _entire content of big1 and big2_ and join the result against small (again in a hashjoin or mergejoin plan). This becomes excessively slow compared to the ideal plan: nested loop seqscan on small nested loop indexscan on big1 where id=small.id indexscan on big2 where id=small.id (or big1.id which is equiv) (The same argument applies if small is not actually small but has restriction clauses) -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LATERAL
On Sat, Oct 17, 2009 at 10:09 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: That still leaves a lot of silly paths, though. In many cases, if you're thinking about joining A to {B C} using an index-accelerated path, you'd be just as well off joining to B first and then to C. So it might be that we only need to consider index-accelerated paths when there is no legal join between the LHS and a subset of the RHS. Yeah. If there are no join order constraints, it's always possible to form a plan such that you join a given rel only once you have all the rels needed (to provide values for the inner indexscan) on the lefthand side. I think this is probably the main reason why the issue was not treated in the planner originally. So maybe the way to think about this is as a way of dealing with join order constraints without losing the benefits of inner indexscans. The other problem I see here is that the bottom-up approach that we use in general is going to be difficult to apply here, because the set of paths will vary depending on what parameters are pushed down from the outer side. Well, we deal with that already --- the set of possible inner indexscan paths already varies depending on what the LHS is. I think the point here is that we'd be considering an inner path that's against an LHS that it's not legal to join the inner rel to *directly*. Such a path would only be legal if we later join to that LHS at a higher join level. So we'd be keeping around some tentative paths that might not ever form a valid join plan. Maybe we should turn around the way that inner indexscan paths are made. Currently we form them on-the-fly while considering a valid join combination. Maybe we should build them all at the first level (driving this off the set of available join clauses for each base rel) and mark each such path as requires a join to this other set of rels to be valid. But then we'd go ahead and join such paths to *other* rels, keeping the resulting join paths still marked as requiring the same future join. Once that join actually happens, the resulting path becomes fully valid. Only a join to a proper subset of the future-join requirement would be disallowed meanwhile. I'm not even sure this would be slower or more complicated than what we do now --- if you look at the logic that caches potential inner indexscan plans, it's almost doing this already. It would result in considering more join paths, but only ones that have some plausible use. Wow, that's pretty sneaky. I had to read this email twice before I understood what you were proposing. It sounds to me like it will work. I'm not 100% sure what the impact on planner performance will be, but it's at least plausible that it will be OK. I think you should only ever join an incomplete inner-indexscan path to (1) another inner-indexscan path or (2) the cheapest total path for *exactly* the future-join requirement. Anything else doesn't seem productive. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LATERAL
Robert Haas robertmh...@gmail.com writes: I think you should only ever join an incomplete inner-indexscan path to (1) another inner-indexscan path or (2) the cheapest total path for *exactly* the future-join requirement. Anything else doesn't seem productive. I don't see any reason to constrain the form of joins before the future-join requirement. Once you get to the join that satisfies that requirement, obviously you must implement it as a nestloop with the inner-indexscan path on the inside. But there shouldn't be any more constraint beyond that one for that join, either: * you might want a fast-start plan not cheapest total * sort orderings of the outer path might be useful We know that sort ordering of the inner indexscan or its joins won't be useful once we reach the future-join level, so it might be possible to discard paths that are not cheapest total cost below that level. The problem is to distinguish sort orderings that are useful within joins below that level. You could avoid that if you could convince yourself that there's no point in ever doing a mergejoin at such a level, but I don't think I believe that. It may well be that there's no point in being too tense about this issue anyway. The planner will only consider early joins to rels that have join clauses to the rel with the inner-indexscan path. There probably won't be that many of them. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LATERAL
On Sun, Oct 18, 2009 at 11:30 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: I think you should only ever join an incomplete inner-indexscan path to (1) another inner-indexscan path or (2) the cheapest total path for *exactly* the future-join requirement. Anything else doesn't seem productive. I don't see any reason to constrain the form of joins before the future-join requirement. Once you get to the join that satisfies that requirement, obviously you must implement it as a nestloop with the inner-indexscan path on the inside. But there shouldn't be any more constraint beyond that one for that join, either: * you might want a fast-start plan not cheapest total * sort orderings of the outer path might be useful We know that sort ordering of the inner indexscan or its joins won't be useful once we reach the future-join level, so it might be possible to discard paths that are not cheapest total cost below that level. Yeah, this was what I was driving at, but I got turned around in my head and was explaining it incorrectly. The problem is to distinguish sort orderings that are useful within joins below that level. You could avoid that if you could convince yourself that there's no point in ever doing a mergejoin at such a level, but I don't think I believe that. It may well be that there's no point in being too tense about this issue anyway. The planner will only consider early joins to rels that have join clauses to the rel with the inner-indexscan path. There probably won't be that many of them. You could probably convince me that a merge join is not going to be too useful (how often can you want a merge join on the inner side of a nested loop? ... especially when there are partial index scans involved) but I think your last point here is well-taken. We've certainly turned a corner from it's just the exponential growth in the number of join paths to consider that's a problem. :-) ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LATERAL
Robert Haas robertmh...@gmail.com writes: You could probably convince me that a merge join is not going to be too useful (how often can you want a merge join on the inner side of a nested loop? Why not? As Andrew pointed out, what we're really trying to accomplish here is consider sub-join plans that are parameterized by a value obtained from an outer relation. I think we shouldn't artificially limit what we consider. But anyway I think we're on the same page here: what we ought to do is try implementing this scheme without any extra restrictions on what it considers, and see what the performance is like. We can try to limit what it considers if it turns out not to work well in the simplest form. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LATERAL
On Sun, Oct 18, 2009 at 3:57 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: You could probably convince me that a merge join is not going to be too useful (how often can you want a merge join on the inner side of a nested loop? Why not? As Andrew pointed out, what we're really trying to accomplish here is consider sub-join plans that are parameterized by a value obtained from an outer relation. I think we shouldn't artificially limit what we consider. But anyway I think we're on the same page here: what we ought to do is try implementing this scheme without any extra restrictions on what it considers, and see what the performance is like. We can try to limit what it considers if it turns out not to work well in the simplest form. And then when we get done we can consider implementing $SUBJECT, which is no longer what this thread is about at all. :-) ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LATERAL
On Tue, Sep 22, 2009 at 11:21 PM, Tom Lane t...@sss.pgh.pa.us wrote: Currently, however, we only consider this possibility when the inner rel is NOT a joinrel. It seems like it might be possible to change this, but it doesn't look straightforward. Well, it's straightforward enough in theory, it's just the exponential growth in the number of join paths to consider that's a problem :-(. I think what we'd need is a heuristic to limit the paths considered. [ picking this up again now that CommitFest is over ] So that I don't have to keep referring to this possibility and similar circumlocution, let's define an index-accelerated path (feel free to suggest a different term) to be the generalization to joinrels of what we now call a nested inner index-scan. In other words, they can only be usefully used on the inner side of a nested loop, where they'll be rescanned with different parameters for each outer row, and they can only ever win when some part of the path involves a *partial* index scan that can used one of the passed parameters as an index qual. For a fixed a set of parameters to be pushed down from the outer side, it seems to me that we could build up a set of index-accelerated paths for a joinrel {A B} by considering the combination of (1) an index-accelerated path for A joined to an index-accelerated path for B, or (2) an index-accelerated path for A joined to the cheapest total path for B. I believe we don't need to worry about path keys because this will only ever be used on the inner side of a nested loop, so the pathkeys will be ignored anyway. In other words, the first heuristic is that you can't build up an index-accelerated path for the joinrel unless there is an index-accelerated path for a least one of its baserels. That still leaves a lot of silly paths, though. In many cases, if you're thinking about joining A to {B C} using an index-accelerated path, you'd be just as well off joining to B first and then to C. So it might be that we only need to consider index-accelerated paths when there is no legal join between the LHS and a subset of the RHS. But I'm not completely sure that I'm right about this, nor whether it's easy to check for. The other problem I see here is that the bottom-up approach that we use in general is going to be difficult to apply here, because the set of paths will vary depending on what parameters are pushed down from the outer side. I think Andrew was suggesting that we not attempt to consider this automatically, but only when the user writes the query in a way that exposes it directly via LATERAL. While that goes against my normal instincts for the planner, it isn't unreasonable as a first step. Possibly, but I'm not sure we have a good enough design sketch at this point to even know whether such a restriction would be useful. Another thought, only semi-related. One of the big use cases for LATERAL in general is to use a set-returning function in the FROM clause that uses vars from a preceding FROM item. I am idly wondering if there's a reason why ExecProject is not its own node type. It seems that we have close to the same logic scattered through a whole bunch of different node types... ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LATERAL
Robert Haas robertmh...@gmail.com writes: Another thought, only semi-related. One of the big use cases for LATERAL in general is to use a set-returning function in the FROM clause that uses vars from a preceding FROM item. I am idly wondering if there's a reason why ExecProject is not its own node type. You generally need it everywhere. Scan nodes need it because you don't want unused columns propagating upwards, and join nodes need it because the two input tuples have to be unified somehow. We do skip projection ability in a few node types, but I doubt it would be profitable to remove it from the rest. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LATERAL
Robert Haas robertmh...@gmail.com writes: That still leaves a lot of silly paths, though. In many cases, if you're thinking about joining A to {B C} using an index-accelerated path, you'd be just as well off joining to B first and then to C. So it might be that we only need to consider index-accelerated paths when there is no legal join between the LHS and a subset of the RHS. Yeah. If there are no join order constraints, it's always possible to form a plan such that you join a given rel only once you have all the rels needed (to provide values for the inner indexscan) on the lefthand side. I think this is probably the main reason why the issue was not treated in the planner originally. So maybe the way to think about this is as a way of dealing with join order constraints without losing the benefits of inner indexscans. The other problem I see here is that the bottom-up approach that we use in general is going to be difficult to apply here, because the set of paths will vary depending on what parameters are pushed down from the outer side. Well, we deal with that already --- the set of possible inner indexscan paths already varies depending on what the LHS is. I think the point here is that we'd be considering an inner path that's against an LHS that it's not legal to join the inner rel to *directly*. Such a path would only be legal if we later join to that LHS at a higher join level. So we'd be keeping around some tentative paths that might not ever form a valid join plan. Maybe we should turn around the way that inner indexscan paths are made. Currently we form them on-the-fly while considering a valid join combination. Maybe we should build them all at the first level (driving this off the set of available join clauses for each base rel) and mark each such path as requires a join to this other set of rels to be valid. But then we'd go ahead and join such paths to *other* rels, keeping the resulting join paths still marked as requiring the same future join. Once that join actually happens, the resulting path becomes fully valid. Only a join to a proper subset of the future-join requirement would be disallowed meanwhile. I'm not even sure this would be slower or more complicated than what we do now --- if you look at the logic that caches potential inner indexscan plans, it's almost doing this already. It would result in considering more join paths, but only ones that have some plausible use. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LATERAL
On Wed, Sep 9, 2009 at 11:25 PM, Andrew Gierth and...@tao11.riddles.org.uk wrote: 4. LATERAL allows some optimizations that aren't currently done, either by explicitly rewriting the query, or (in theory) the optimizer itself could consider a lateral plan (I believe Oracle does this). This would apply to queries of this form: SELECT ... FROM t1 LEFT JOIN (t2 JOIN t3 ON (t2.a=t3.a)) on (t1.a=t2.a); which currently forces the t2/t3 join to occur first even where t1 is small; this could be rewritten with LATERAL as: SELECT ... FROM t1 LEFT JOIN LATERAL (select * from t2 join t3 on (t2.a=t3.a) where t2.a=t1.a) s ON true; Robert Well, you haven't actually commuted the joins here - how do Robert you have in mind for PostgreSQL to execute this? I'm Robert guessing that it's something like a nest loop with t1 as the Robert outer side and the lateral subquery as the inner side, so Robert that the executor repeatedly executes Robert select * from t2 join t3 on t2.a = t3.a where t2.a = $1? Yup. The current execution plans for this type of query are completely disastrous if t1 is small (or qualified so as to be small) and t2 and t3 are large. Having LATERAL would allow the query to be rewritten to perform reasonably; a bonus would be for the planner to consider the lateral join automatically without requiring it to be explicitly requested. I've been turning this one over in my head. It seems to me that this is very similar to what we already do with inner index-scans, only generalized to joinrels. When we consider nested loop plans in match_unsorted_outer(), we really consider two quite different types of plans - call them parameterized and unparameterized. The unparameterized variants considered regardless of the details of the inner plan, and basically rescan the same identical inner side once per outer row, possibly with a materialize node interposed. The parameterized variants are what the inner index-scan code is looking at: they also involve rescanning the inner path, but not the same set of tuples every time. Instead, we look for cases where an index is available that can support a lookup of the specific values that the outer side needs on each pass as an index condition. Currently, however, we only consider this possibility when the inner rel is NOT a joinrel. It seems like it might be possible to change this, but it doesn't look straightforward. When the inner rel is a baserel, there's really nothing to do except grovel through the indexes looking for something useful, and then estimate the cost of using it. When the inner rel is a joinrel, it's a lot more complicated. A parameterized nested loop might be right even if only some of the rels making up the joinrel are indexed (imagine t2 IJ t3 IJ t4 where t2 and t3 are large and indexed and t4 is small and unindexed). In addition, the relations making up the inner rel could be joined in any order and using a variety of strategies. The path that is best when trying to suck down the ENTIRE inner rel doesn't necessarily bear any resemblence to the plan that is best when trying to suck down only a part of it. To some extent, this seems like a tangent as far as LATERAL is concerned: I think the primary reason why people want LATERAL (certainly, why I want it) is for SRFs, for which there isn't any choice of execution plan anyway. But it's certainly an interesting optimization problem, and I wish I had some better ideas on how to tackle it. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LATERAL
Robert Haas robertmh...@gmail.com writes: I've been turning this one over in my head. It seems to me that this is very similar to what we already do with inner index-scans, only generalized to joinrels. Right. Currently, however, we only consider this possibility when the inner rel is NOT a joinrel. It seems like it might be possible to change this, but it doesn't look straightforward. Well, it's straightforward enough in theory, it's just the exponential growth in the number of join paths to consider that's a problem :-(. I think what we'd need is a heuristic to limit the paths considered. I think Andrew was suggesting that we not attempt to consider this automatically, but only when the user writes the query in a way that exposes it directly via LATERAL. While that goes against my normal instincts for the planner, it isn't unreasonable as a first step. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LATERAL
Robert == Robert Haas robertmh...@gmail.com writes: Just to pick up on some points from the discussion: 1. LATERAL has to be explicit because it changes the scope of references. For example, in: ... (select ... FROM (select a AS b), (select b)) ... the b in the second subselect could be an outer reference, but it can't be a reference to the first subquery; whereas in: ... (select ... FROM (select a AS b), LATERAL (select b)) ... the b in the second subselect refers to the result of the first subselect. Robert Can you provide a more complete example? I'm unable to Robert construct a working example of this type. For example: Robert rhaas=# select (select 1 from (select a as b) x, (select b) y) from t1; Robert ERROR: subquery in FROM cannot refer to other relations of same query Robert level at character 50 That looks like a bug to me. The spec is explicit that the inner definition of b is not in scope in the second subquery, and therefore that should parse as an outer reference. 2. LATERAL in general constrains both the join order and the join plan, assuming any lateral references are actually made. Robert Peter seemed to be saying that LATERAL() must syntactically Robert follow the same-level FROM items to which it refers. Is that Robert your understanding also? LATERAL references must be to items defined in the same FROM clause and to the left of the LATERAL. The relevant language of the spec seems to be: a) If TR is contained in a from clause FC with no intervening query expression, then the scope clause SC of TR is the select statement: single row or innermost query specification that contains FC. The scope of a range variable of TR is the select list, where clause, group by clause, having clause, and window clause of SC, together with every lateral derived table that is simply contained in FC and is preceded by TR, and every collection derived table that is simply contained in FC and is preceded by TR, and the join condition of all joined tables contained in SC that contain TR. If SC is the query specification that is the query expression body of a simple table query STQ, then the scope of a range variable of TR also includes the order by clause of STQ. 4. LATERAL allows some optimizations that aren't currently done, either by explicitly rewriting the query, or (in theory) the optimizer itself could consider a lateral plan (I believe Oracle does this). This would apply to queries of this form: SELECT ... FROM t1 LEFT JOIN (t2 JOIN t3 ON (t2.a=t3.a)) on (t1.a=t2.a); which currently forces the t2/t3 join to occur first even where t1 is small; this could be rewritten with LATERAL as: SELECT ... FROM t1 LEFT JOIN LATERAL (select * from t2 join t3 on (t2.a=t3.a) where t2.a=t1.a) s ON true; Robert Well, you haven't actually commuted the joins here - how do Robert you have in mind for PostgreSQL to execute this? I'm Robert guessing that it's something like a nest loop with t1 as the Robert outer side and the lateral subquery as the inner side, so Robert that the executor repeatedly executes Robert select * from t2 join t3 on t2.a = t3.a where t2.a = $1? Yup. The current execution plans for this type of query are completely disastrous if t1 is small (or qualified so as to be small) and t2 and t3 are large. Having LATERAL would allow the query to be rewritten to perform reasonably; a bonus would be for the planner to consider the lateral join automatically without requiring it to be explicitly requested. -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LATERAL
On mån, 2009-09-07 at 19:06 -0400, Robert Haas wrote: On Mon, Sep 7, 2009 at 3:43 AM, Peter Eisentrautpete...@gmx.net wrote: Because joins can be reordered, whereas LATERAL creates a kind of syntactic sequence point for join reordering. To pick up your example: But this doesn't [work]: select g, h from generate_series(1,10) g, generate_series(1,g) h; You need to constrain the order of the from items in some way so the g refers to something well-defined. That's what LATERAL does. I don't think so. All joins constrain the join order, I carefully did not say constrain the join order but constraint the order of the from items and a *syntactic* sequence point. If the order of the from items is free, then you could also write the above example as select g, h from generate_series(1,g) h, generate_series(1,10) g; but that would no longer be valid with LATERAL in there. but none of them except FULL JOIN constrain it completely, and this is no exception. FWIW, the spec says somewhere that LATERAL is not allowed at or near an outer join. I'll have to read up on it again. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LATERAL
David == David Fetter da...@fetter.org writes: I've attempted to search the archives for references to the SQL LATERAL feature, which AIUI is fairly-frequently requested. [snip] Has anyone poked at this at all? David I believe Andrew (RhodiumToad) Gierth is taking a look at David implementing the full LATERAL feature from SQL:2008. I've looked, but I've not actually had time to try any actual work on implementation, so anyone else who fancies a go shouldn't hesitate. Just to pick up on some points from the discussion: 1. LATERAL has to be explicit because it changes the scope of references. For example, in: ... (select ... FROM (select a AS b), (select b)) ... the b in the second subselect could be an outer reference, but it can't be a reference to the first subquery; whereas in: ... (select ... FROM (select a AS b), LATERAL (select b)) ... the b in the second subselect refers to the result of the first subselect. 2. LATERAL in general constrains both the join order and the join plan, assuming any lateral references are actually made. 3. LATERAL specifically IS allowed with left outer joins, though the syntax productions in the spec are sufficiently obscure that this isn't obvious. In general there are (as far as I can tell from the syntax rules) two ways to use it: SELECT ... FROM foo, LATERAL (bar) or SELECT ... FROM foo [LEFT] JOIN LATERAL (bar) ON ... Note that RIGHT JOIN LATERAL and FULL JOIN LATERAL are expressly excluded (syntax rule 2 for joined table). 4. LATERAL allows some optimizations that aren't currently done, either by explicitly rewriting the query, or (in theory) the optimizer itself could consider a lateral plan (I believe Oracle does this). This would apply to queries of this form: SELECT ... FROM t1 LEFT JOIN (t2 JOIN t3 ON (t2.a=t3.a)) on (t1.a=t2.a); which currently forces the t2/t3 join to occur first even where t1 is small; this could be rewritten with LATERAL as: SELECT ... FROM t1 LEFT JOIN LATERAL (select * from t2 join t3 on (t2.a=t3.a) where t2.a=t1.a) s ON true; -- Andrew (irc:RhodiumToad) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LATERAL
On Tue, Sep 8, 2009 at 6:29 PM, Andrew Gierthand...@tao11.riddles.org.uk wrote: David == David Fetter da...@fetter.org writes: I've attempted to search the archives for references to the SQL LATERAL feature, which AIUI is fairly-frequently requested. [snip] Has anyone poked at this at all? David I believe Andrew (RhodiumToad) Gierth is taking a look at David implementing the full LATERAL feature from SQL:2008. I've looked, but I've not actually had time to try any actual work on implementation, so anyone else who fancies a go shouldn't hesitate. Thanks for your thoughts - I appreciate it. Just to pick up on some points from the discussion: 1. LATERAL has to be explicit because it changes the scope of references. For example, in: ... (select ... FROM (select a AS b), (select b)) ... the b in the second subselect could be an outer reference, but it can't be a reference to the first subquery; whereas in: ... (select ... FROM (select a AS b), LATERAL (select b)) ... the b in the second subselect refers to the result of the first subselect. Can you provide a more complete example? I'm unable to construct a working example of this type. For example: rhaas=# select (select 1 from (select a as b) x, (select b) y) from t1; ERROR: subquery in FROM cannot refer to other relations of same query level at character 50 Though this works as expected: rhaas=# select (select 1 from (select a) x, (select b) y) from t1; 2. LATERAL in general constrains both the join order and the join plan, assuming any lateral references are actually made. Peter seemed to be saying that LATERAL() must syntactically follow the same-level FROM items to which it refers. Is that your understanding also? 3. LATERAL specifically IS allowed with left outer joins, though the syntax productions in the spec are sufficiently obscure that this isn't obvious. In general there are (as far as I can tell from the syntax rules) two ways to use it: SELECT ... FROM foo, LATERAL (bar) or SELECT ... FROM foo [LEFT] JOIN LATERAL (bar) ON ... Note that RIGHT JOIN LATERAL and FULL JOIN LATERAL are expressly excluded (syntax rule 2 for joined table). Makes sense to me. 4. LATERAL allows some optimizations that aren't currently done, either by explicitly rewriting the query, or (in theory) the optimizer itself could consider a lateral plan (I believe Oracle does this). This would apply to queries of this form: SELECT ... FROM t1 LEFT JOIN (t2 JOIN t3 ON (t2.a=t3.a)) on (t1.a=t2.a); which currently forces the t2/t3 join to occur first even where t1 is small; this could be rewritten with LATERAL as: SELECT ... FROM t1 LEFT JOIN LATERAL (select * from t2 join t3 on (t2.a=t3.a) where t2.a=t1.a) s ON true; Well, you haven't actually commuted the joins here - how do you have in mind for PostgreSQL to execute this? I'm guessing that it's something like a nest loop with t1 as the outer side and the lateral subquery as the inner side, so that the executor repeatedly executes select * from t2 join t3 on t2.a = t3.a where t2.a = $1? ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LATERAL
On Sun, 2009-09-06 at 23:59 -0400, Robert Haas wrote: Based on reading through this discussion, it appears that LATERAL is mostly a bit of syntactic sugar that requests that the parser allow you to reference tables at the same query level. Assuming that the necessary executor support were present (which it's currently not), it's unclear to me why one couldn't simply allow such references unconditionally. Because joins can be reordered, whereas LATERAL creates a kind of syntactic sequence point for join reordering. To pick up your example: But this doesn't [work]: select g, h from generate_series(1,10) g, generate_series(1,g) h; You need to constrain the order of the from items in some way so the g refers to something well-defined. That's what LATERAL does. You could argue that the parser could infer the references and the resultant join ordering restrictions automatically, but perhaps it was deemed that an explicit specification would be less error-prone. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LATERAL
On Mon, Sep 7, 2009 at 3:43 AM, Peter Eisentrautpete...@gmx.net wrote: On Sun, 2009-09-06 at 23:59 -0400, Robert Haas wrote: Based on reading through this discussion, it appears that LATERAL is mostly a bit of syntactic sugar that requests that the parser allow you to reference tables at the same query level. Assuming that the necessary executor support were present (which it's currently not), it's unclear to me why one couldn't simply allow such references unconditionally. Because joins can be reordered, whereas LATERAL creates a kind of syntactic sequence point for join reordering. To pick up your example: But this doesn't [work]: select g, h from generate_series(1,10) g, generate_series(1,g) h; You need to constrain the order of the from items in some way so the g refers to something well-defined. That's what LATERAL does. I don't think so. All joins constrain the join order, but none of them except FULL JOIN constrain it completely, and this is no exception. Consider this query: select * from foo f, generate_series(1, 10) g, generate_series(1, g) h WHERE f.x = g; There are two legal join orderings here: f {g h} and {f g} h, just as there would be for a three-way inner join: select * from foo f, goo g, hoo h WHERE f.x = g.x and g.x = h.x; I believe that the join-order restrictions imposed by a LATERAL SRF are identical to those that would be imposed by an inner join against a table with join clauses referencing the same tables used in the inputs to the SRF. What is different is that there is only one possible method of implementing the join, namely, for each outer row, evaluate the SRF. We can't hash or merge, and we can't swap the inner and outer rels, but we do still have a choice as to whether to join against h before or after joining against f. You could argue that the parser could infer the references and the resultant join ordering restrictions automatically, but perhaps it was deemed that an explicit specification would be less error-prone. Well, the irony is that our current code does already infer the references - and then disallows them. It seems to me that if we implement LATERAL(), we're basically going to be conditionalizing those checks on whether the relevant subexpression has been wrapped in LATERAL or not. Introducing a new keyword would make sense if it were possible to write a query that is valid without LATERAL(), but means something different with LATERAL() - but I'm currently unable to devise such a scenario. Whether this is for want of creativity or because none exists I'm not entirely sure. Any ideas? ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LATERAL
Robert Haas robertmh...@gmail.com writes: On Mon, Sep 7, 2009 at 3:43 AM, Peter Eisentrautpete...@gmx.net wrote: You could argue that the parser could infer the references and the resultant join ordering restrictions automatically, but perhaps it was deemed that an explicit specification would be less error-prone. Well, the irony is that our current code does already infer the references - and then disallows them. Because as often as not, they're mistakes. Please don't think you're smarter than the spec here. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LATERAL
On Mon, Sep 7, 2009 at 7:47 PM, Tom Lanet...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Mon, Sep 7, 2009 at 3:43 AM, Peter Eisentrautpete...@gmx.net wrote: You could argue that the parser could infer the references and the resultant join ordering restrictions automatically, but perhaps it was deemed that an explicit specification would be less error-prone. Well, the irony is that our current code does already infer the references - and then disallows them. Because as often as not, they're mistakes. Please don't think you're smarter than the spec here. You're frequently the first to criticize the spec, but I have no interest in second-guessing whatever behavior the spec specifies for this construct. I'm just trying to understand it, and as far as I can tell, LATERAL() is just a piece of syntactic sugar that allows expressions within to reference FROM items at the same query level. As far as I can tell, it doesn't change the semantic of any legal queries - it just renders legal notation that otherwise would be rejected. But is my understanding correct? I haven't got a copy of the spec, so that's a bit of a handicap. If someone who does can look this up and comment on how it's supposed to work, I would certainly appreciate that. My understanding of it is currently based on random articles cherry-picked around the Internet and a handful of emails from archives.postgresql.org, which seems a little thin. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LATERAL
Robert Haas escribió: I haven't got a copy of the spec, so that's a bit of a handicap. If someone who does can look this up and comment on how it's supposed to work, I would certainly appreciate that. My understanding of it is currently based on random articles cherry-picked around the Internet and a handful of emails from archives.postgresql.org, which seems a little thin. http://wiki.postgresql.org/wiki/Developer_FAQ#Where_can_I_get_a_copy_of_the_SQL_standards.3F -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LATERAL
Robert, * Robert Haas (robertmh...@gmail.com) wrote: On Mon, Sep 7, 2009 at 7:47 PM, Tom Lanet...@sss.pgh.pa.us wrote: Because as often as not, they're mistakes. Please don't think you're smarter than the spec here. You're frequently the first to criticize the spec, but I have no interest in second-guessing whatever behavior the spec specifies for this construct. I'm not entirely sure you followed what Tom was getting at here. If you did, feel free to ignore me. I'm just trying to understand it, and as far as I can tell, LATERAL() is just a piece of syntactic sugar that allows expressions within to reference FROM items at the same query level. What I'm gathering is that this may be correct, though I don't know for sure. The point I think Tom was making is that even if it *is* just syntactic sugar, we don't want to allow expressions to reference FROM items at the same query level *unless* LATERAL is specified. Your earlier comments sounded like you would want to implement allowing expressions to refer to FROM items at the same query level without LATERAL being specified. I haven't got a copy of the spec, so that's a bit of a handicap. If someone who does can look this up and comment on how it's supposed to work, I would certainly appreciate that. My understanding of it is currently based on random articles cherry-picked around the Internet and a handful of emails from archives.postgresql.org, which seems a little thin. You can get a 'draft' that's very close to the spec pretty easily.. Just do '??sql' in IRC sometime.. Enjoy, Stephen signature.asc Description: Digital signature
Re: [HACKERS] LATERAL
On Mon, Sep 7, 2009 at 9:12 PM, Stephen Frostsfr...@snowman.net wrote: Robert, * Robert Haas (robertmh...@gmail.com) wrote: On Mon, Sep 7, 2009 at 7:47 PM, Tom Lanet...@sss.pgh.pa.us wrote: Because as often as not, they're mistakes. Please don't think you're smarter than the spec here. You're frequently the first to criticize the spec, but I have no interest in second-guessing whatever behavior the spec specifies for this construct. I'm not entirely sure you followed what Tom was getting at here. If you did, feel free to ignore me. I'm just trying to understand it, and as far as I can tell, LATERAL() is just a piece of syntactic sugar that allows expressions within to reference FROM items at the same query level. What I'm gathering is that this may be correct, though I don't know for sure. The point I think Tom was making is that even if it *is* just syntactic sugar, we don't want to allow expressions to reference FROM items at the same query level *unless* LATERAL is specified. Your earlier comments sounded like you would want to implement allowing expressions to refer to FROM items at the same query level without LATERAL being specified. Fair enough. I think I started to drift off in the direction of making that argument, but it wasn't really my point. The original point I was trying to make is that we may not need to invent any kind of new name-resolution or scoping in order to make LATERAL() work. Instead, LATERAL() can just do everything normally with the exception of not throwing the following errors when they would otherwise be thrown: subquery in FROM cannot refer to other relations of the same query level function expression in FROM cannot refer to other relations of the same query level I'm not sure if I'm right about this, but if I am, it seems likely to be a pretty straightforward change. I haven't got a copy of the spec, so that's a bit of a handicap. If someone who does can look this up and comment on how it's supposed to work, I would certainly appreciate that. My understanding of it is currently based on random articles cherry-picked around the Internet and a handful of emails from archives.postgresql.org, which seems a little thin. You can get a 'draft' that's very close to the spec pretty easily.. Just do '??sql' in IRC sometime.. Thanks for the tip. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LATERAL
* Robert Haas (robertmh...@gmail.com) wrote: Fair enough. I think I started to drift off in the direction of making that argument, but it wasn't really my point. To be honest, I'm not sure I agree with Tom here on the value of requiring a keyword to tell the system that you really mean what you wrote. On the other hand, it sounds like the spec is pretty clear on this, and I don't feel we should violate it just because we think it's being silly on this point. The original point I was trying to make is that we may not need to invent any kind of new name-resolution or scoping in order to make LATERAL() work. Instead, LATERAL() can just do everything normally with the exception of not throwing the following errors when they would otherwise be thrown: I don't know for sure, but I do hope you're right. I'd certainly love to be able to do this in general.. There's a number of cases where I've had to do the hokey-pokey to get around our lack of ability to do this when using set-returning functions. I'm not sure if I'm right about this, but if I am, it seems likely to be a pretty straightforward change. Please continue to explore it and propose a patch. :) Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] LATERAL
On Mon, Sep 7, 2009 at 10:08 PM, Stephen Frostsfr...@snowman.net wrote: * Robert Haas (robertmh...@gmail.com) wrote: Fair enough. I think I started to drift off in the direction of making that argument, but it wasn't really my point. To be honest, I'm not sure I agree with Tom here on the value of requiring a keyword to tell the system that you really mean what you wrote. On the other hand, it sounds like the spec is pretty clear on this, and I don't feel we should violate it just because we think it's being silly on this point. That's pretty much where I am, too, modulo needing to better understand the aforesaid spec. The original point I was trying to make is that we may not need to invent any kind of new name-resolution or scoping in order to make LATERAL() work. Instead, LATERAL() can just do everything normally with the exception of not throwing the following errors when they would otherwise be thrown: I don't know for sure, but I do hope you're right. I'd certainly love to be able to do this in general.. There's a number of cases where I've had to do the hokey-pokey to get around our lack of ability to do this when using set-returning functions. Me too. I'm not sure if I'm right about this, but if I am, it seems likely to be a pretty straightforward change. Please continue to explore it and propose a patch. :) Yeah, that's the not-so-easy part. Gotta grok this executor stuff first before I can even think about implementing this. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] LATERAL
I've attempted to search the archives for references to the SQL LATERAL feature, which AIUI is fairly-frequently requested. Most of the discussion that I've found harks back to 2003, and that seems to discuss more the need for eventual support of the feature than exactly what it is or how to implement it. Looking around the web a little, I found these links: http://farrago.sourceforge.net/design/CollectionTypes.html http://iablog.sybase.com/paulley/2008/07/cross-and-outer-apply/ Based on reading through this discussion, it appears that LATERAL is mostly a bit of syntactic sugar that requests that the parser allow you to reference tables at the same query level. Assuming that the necessary executor support were present (which it's currently not), it's unclear to me why one couldn't simply allow such references unconditionally. We currently explicitly block such references in parse_clause.c, but the comments indicate that this is for reasons of standards-conformance, not semantic ambiguity. It appears that we're not completely without the ability to handle queries of this type. For example, this works: select g, generate_series(1,g) as h from generate_series(1,10) g; But this doesn't: select g, h from generate_series(1,10) g, generate_series(1,g) h; Just for kicks, I tried removing the code that throws a syntax error on the latter query, which resulted in a core dump inside ExecEvalVar(), execQual.c:546, trying to deference a TupleTableSlot. I'm guessing that this is because it can't get access to the right variables - the plan actually looks quite sane: rhaas=# explain select g, h from generate_series(1,10) g, generate_series(1,g) h; QUERY PLAN Nested Loop (cost=0.00..22512.50 rows=100 width=8) - Function Scan on generate_series g (cost=0.00..12.50 rows=1000 width=4) - Function Scan on generate_series h (cost=0.00..12.50 rows=1000 width=4) (3 rows) The first query just shows up a function scan, which means there's some projection operation happening here that isn't exposed at the plan level. I'm guessing that what needs to happen here is that the planner needs to be taught that LATERAL queries can only be implemented as a nested loop (right? unless they're not really using the LATERAL-ness...) and that the executor needs to be taught how to make the vars for the outer side of a nestloop visible to the inner side, when necessary. Has anyone poked at this at all? ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LATERAL
On Sun, Sep 06, 2009 at 11:59:22PM -0400, Robert Haas wrote: I've attempted to search the archives for references to the SQL LATERAL feature, which AIUI is fairly-frequently requested. [snip] Has anyone poked at this at all? I believe Andrew (RhodiumToad) Gierth is taking a look at implementing the full LATERAL feature from SQL:2008. Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers