Re: [HACKERS] Allowing join removals for more join types
On Wed, Jul 16, 2014 at 1:17 PM, Tom Lane t...@sss.pgh.pa.us wrote: David Rowley dgrowle...@gmail.com writes: I've attached an updated patch which puts in some fast path code for subquery type joins. I'm really not too sure on a good name for this function. I've ended up with query_supports_distinctness() which I'm not that keen on, but I didn't manage to come up with anything better. I've committed this with some mostly but not entirely cosmetic changes. Great! thanks for taking the time to give me guidance on this and commit it too. Simon, thank you for taking the time to review the code. Notably, I felt that pathnode.c was a pretty questionable place to be exporting distinctness-proof logic from, and after some reflection decided to move those functions to analyzejoins.c; that's certainly a better place for them than pathnode.c, and I don't see any superior third alternative. That seems like a good change. Also makes be wonder a bit why clause_sides_match_join is duplicated in joinpath.c and analyzejoins.c, is this just so that it can be inlined? Thanks also for making the change to create_unique_path to make use of the new query_supports_distinctness function. Regards David Rowley
Re: [HACKERS] Allowing join removals for more join types
David Rowley dgrowle...@gmail.com writes: On Wed, Jul 16, 2014 at 1:17 PM, Tom Lane t...@sss.pgh.pa.us wrote: Notably, I felt that pathnode.c was a pretty questionable place to be exporting distinctness-proof logic from, and after some reflection decided to move those functions to analyzejoins.c; that's certainly a better place for them than pathnode.c, and I don't see any superior third alternative. That seems like a good change. Also makes be wonder a bit why clause_sides_match_join is duplicated in joinpath.c and analyzejoins.c, is this just so that it can be inlined? Hm ... probably just didn't seem worth the trouble to try to share the code. It's not really something that either module should be exporting. I guess some case could be made for exporting it from util/restrictinfo.c, but it'd still seem like a bit of a wart. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allowing join removals for more join types
David Rowley dgrowle...@gmail.com writes: I've attached an updated patch which puts in some fast path code for subquery type joins. I'm really not too sure on a good name for this function. I've ended up with query_supports_distinctness() which I'm not that keen on, but I didn't manage to come up with anything better. I've committed this with some mostly but not entirely cosmetic changes. Notably, I felt that pathnode.c was a pretty questionable place to be exporting distinctness-proof logic from, and after some reflection decided to move those functions to analyzejoins.c; that's certainly a better place for them than pathnode.c, and I don't see any superior third alternative. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allowing join removals for more join types
On Wed, Jul 9, 2014 at 12:59 PM, Tom Lane t...@sss.pgh.pa.us wrote: David Rowley dgrow...@gmail.com writes: On 9 July 2014 09:27, Tom Lane t...@sss.pgh.pa.us wrote: On review it looks like analyzejoins.c would possibly benefit from an earlier fast-path check as well. Do you mean for non-subqueries? There already is a check to see if the relation has no indexes. Oh, sorry, that was a typo: I meant to write pathnode.c. Specifically, we could skip the translate_sub_tlist step. Admittedly that's not hugely expensive, but as long as we have the infrastructure for a quick check it might be worth doing. TBH I find the checks for FOR UPDATE and volatile functions to be questionable as well. Well, that's a real tough one for me as I only added that based on what you told me here: I doubt you should drop a subquery containing FOR UPDATE, either. That's a side effect, just as much as a volatile function would be. Hah ;-). But the analogy to qual pushdown hadn't occurred to me at the time. Ok, I've removed the check for volatile functions and FOR UPDATE. As far as I know the FOR UPDATE check is pretty much void as of now anyway, since the current state of query_is_distinct_for() demands that there's either a DISTINCT, GROUP BY or just a plain old aggregate without any grouping, which will just return a single row, neither of these will allow FOR UPDATE anyway. True. So the effort here should be probably be more focused on if we should allow the join removal when the subquery contains volatile functions. We should probably think fairly hard on this now as I'm still planning on working on INNER JOIN removals at some point and I'm thinking we should likely be consistent between the 2 types of join for when it comes to FOR UPDATE and volatile functions, and I'm thinking right now that for INNER JOINs that, since they're INNER that we could remove either side of the join. In that case maybe it would be harder for the user to understand why their volatile function didn't get executed. Color me dubious. In exactly what circumstances would it be valid to suppress an inner join involving a sub-select? hmm, probably I didn't think this through enough before commenting as I don't actually have any plans for subselects with INNER JOINs. Though saying that I guess there are cases that can be removed... Anything that queries a single table that has a foreign key matching the join condition, where the subquery does not filter or group the results. Obviously something about the query would have to exist that caused it not to be flattened, perhaps some windowing functions... I've attached an updated patch which puts in some fast path code for subquery type joins. I'm really not too sure on a good name for this function. I've ended up with query_supports_distinctness() which I'm not that keen on, but I didn't manage to come up with anything better. Regards David Rowley subquery_leftjoin_removal_v1.6.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allowing join removals for more join types
On Tue, Jul 8, 2014 at 4:28 AM, Tom Lane t...@sss.pgh.pa.us wrote: David Rowley dgrowle...@gmail.com writes: On Mon, Jul 7, 2014 at 4:15 AM, Tom Lane t...@sss.pgh.pa.us wrote: I poked around to see if we didn't have some code already for that, and soon found that not only do we have such code (equality_ops_are_compatible) but actually almost this entire patch duplicates logic that already exists in optimizer/util/pathnode.c, to wit create_unique_path's subroutines query_is_distinct_for et al. So I'm thinking what this needs to turn into is an exercise in refactoring to allow that logic to be used for both purposes. Well, it seems that might just reduce the patch size a little! I currently have this half hacked up to use query_is_distinct_for, but I see there's no code that allows Const's to exist in the join condition. I had allowed for this in groupinglist_is_unique_on_restrictinfo() and I tested it worked in a regression test (which now fails). I think to fix this, all it would take would be to modify query_is_distinct_for to take a list of Node's rather than a list of column numbers then just add some logic that skips if it's a Const and checks it as it does now if it's a Var Would you see a change of this kind a valid refactor for this patch? I'm a bit skeptical as to whether testing for that case is actually worth any extra complexity. Do you have a compelling use-case? But anyway, if we do want to allow it, why does it take any more than adding a check for Consts to the loops in query_is_distinct_for? It's the targetlist entries where we'd want to allow Consts, not the join conditions. I don't really have a compelling use-case, but you're right, it's just a Const check in query_is_distinct_for(), it seems simple enough so I've included that in my refactor of the patch to use query_is_distinct_for(). This allows the regression tests all to pass again. I've included an updated patch and a delta patch. Now a couple of things to note: 1. The fast path code that exited in join_is_removable() for subquery's when the subquery had no group or distinct clause is now gone. I wasn't too sure that I wanted to assume too much about what query_is_distinct_for may do in the future and I thought if I included some logic in join_is_removable() to fast path, that one day it may fast path wrongly. Perhaps we could protect against this with a small note in query_is_distinct_for(). 2. The patch I submitted here http://www.postgresql.org/message-id/caaphdvrfvkh0p3faoogcckby7fecj9qfanklkx7mwsbcxy2...@mail.gmail.com if that gets accepted then it makes the check for set returning functions in join_is_removable void. Regards David Rowley subquery_leftjoin_removal_v1.5.delta.patch Description: Binary data subquery_leftjoin_removal_v1.5.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allowing join removals for more join types
David Rowley dgrowle...@gmail.com writes: On Tue, Jul 8, 2014 at 4:28 AM, Tom Lane t...@sss.pgh.pa.us wrote: I'm a bit skeptical as to whether testing for that case is actually worth any extra complexity. Do you have a compelling use-case? But anyway, if we do want to allow it, why does it take any more than adding a check for Consts to the loops in query_is_distinct_for? It's the targetlist entries where we'd want to allow Consts, not the join conditions. I don't really have a compelling use-case, but you're right, it's just a Const check in query_is_distinct_for(), it seems simple enough so I've included that in my refactor of the patch to use query_is_distinct_for(). This allows the regression tests all to pass again. Meh. I wrote a regression test that expects it is a pretty poor rationale for adding logic. If you can't point to a real-world case where this is important, I'm inclined to take it out. If we were actually serious about exploiting such cases, looking for bare Consts would be a poor implementation anyhow, not least because const-folding has not yet been applied to the sub-select. I think we'd want to do it for any pseudoconstant expression (no Vars, no volatile functions); which is a substantially more expensive test. 1. The fast path code that exited in join_is_removable() for subquery's when the subquery had no group or distinct clause is now gone. I wasn't too sure that I wanted to assume too much about what query_is_distinct_for may do in the future and I thought if I included some logic in join_is_removable() to fast path, that one day it may fast path wrongly. Or put a quick-check subroutine next to query_is_distinct_for(). The code we're skipping here is not so cheap that I want to blow off skipping it. On review it looks like analyzejoins.c would possibly benefit from an earlier fast-path check as well. 2. The patch I submitted here http://www.postgresql.org/message-id/caaphdvrfvkh0p3faoogcckby7fecj9qfanklkx7mwsbcxy2...@mail.gmail.com if that gets accepted then it makes the check for set returning functions in join_is_removable void. Right (and done, if you didn't notice already). TBH I find the checks for FOR UPDATE and volatile functions to be questionable as well. We have never considered those things to prevent pushdown of quals into a subquery (cf subquery_is_pushdown_safe). I think what we're talking about here is pretty much equivalent to pushing an always-false qual into the subquery; if people haven't complained about that, why should they complain about this? Or to put it in slightly more principled terms, we've attempted to prevent subquery optimization from causing volatile expressions to be evaluated *more* times than the naive reading of the query would suggest, but we have generally not felt that we needed to prevent them from happening *fewer* times. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allowing join removals for more join types
On 9 July 2014 09:27, Tom Lane t...@sss.pgh.pa.us wrote: David Rowley dgrowle...@gmail.com writes: On Tue, Jul 8, 2014 at 4:28 AM, Tom Lane t...@sss.pgh.pa.us wrote: I'm a bit skeptical as to whether testing for that case is actually worth any extra complexity. Do you have a compelling use-case? But anyway, if we do want to allow it, why does it take any more than adding a check for Consts to the loops in query_is_distinct_for? It's the targetlist entries where we'd want to allow Consts, not the join conditions. I don't really have a compelling use-case, but you're right, it's just a Const check in query_is_distinct_for(), it seems simple enough so I've included that in my refactor of the patch to use query_is_distinct_for(). This allows the regression tests all to pass again. Meh. I wrote a regression test that expects it is a pretty poor rationale for adding logic. If you can't point to a real-world case where this is important, I'm inclined to take it out. Ok, I'll pull that logic back out when I get home tonight. If we were actually serious about exploiting such cases, looking for bare Consts would be a poor implementation anyhow, not least because const-folding has not yet been applied to the sub-select. I think we'd want to do it for any pseudoconstant expression (no Vars, no volatile functions); which is a substantially more expensive test. 1. The fast path code that exited in join_is_removable() for subquery's when the subquery had no group or distinct clause is now gone. I wasn't too sure that I wanted to assume too much about what query_is_distinct_for may do in the future and I thought if I included some logic in join_is_removable() to fast path, that one day it may fast path wrongly. Or put a quick-check subroutine next to query_is_distinct_for(). The code we're skipping here is not so cheap that I want to blow off skipping it. Ok, good idea. I'll craft something up tonight along those lines. On review it looks like analyzejoins.c would possibly benefit from an earlier fast-path check as well. Do you mean for non-subqueries? There already is a check to see if the relation has no indexes. 2. The patch I submitted here http://www.postgresql.org/message-id/caaphdvrfvkh0p3faoogcckby7fecj9qfanklkx7mwsbcxy2...@mail.gmail.com if that gets accepted then it makes the check for set returning functions in join_is_removable void. Right (and done, if you didn't notice already). Thanks, I noticed that this morning. I'll remove the (now) duplicate check from the patch TBH I find the checks for FOR UPDATE and volatile functions to be questionable as well. We have never considered those things to prevent pushdown of quals into a subquery (cf subquery_is_pushdown_safe). I think what we're talking about here is pretty much equivalent to pushing an always-false qual into the subquery; if people haven't complained about that, why should they complain about this? Or to put it in slightly more principled terms, we've attempted to prevent subquery optimization from causing volatile expressions to be evaluated *more* times than the naive reading of the query would suggest, but we have generally not felt that we needed to prevent them from happening *fewer* times. Well, that's a real tough one for me as I only added that based on what you told me here: On 20 May 2014 23:22, Tom Lane t...@sss.pgh.pa.us wrote: I doubt you should drop a subquery containing FOR UPDATE, either. That's a side effect, just as much as a volatile function would be. regards, tom lane As far as I know the FOR UPDATE check is pretty much void as of now anyway, since the current state of query_is_distinct_for() demands that there's either a DISTINCT, GROUP BY or just a plain old aggregate without any grouping, which will just return a single row, neither of these will allow FOR UPDATE anyway. I really just added the check just to protect the code from possible future additions to query_is_distinct_for() which may add logic to determine uniqueness by some other means. So the effort here should be probably be more focused on if we should allow the join removal when the subquery contains volatile functions. We should probably think fairly hard on this now as I'm still planning on working on INNER JOIN removals at some point and I'm thinking we should likely be consistent between the 2 types of join for when it comes to FOR UPDATE and volatile functions, and I'm thinking right now that for INNER JOINs that, since they're INNER that we could remove either side of the join. In that case maybe it would be harder for the user to understand why their volatile function didn't get executed. Saying that... off the top of my head I can't remember what we'd do in a case like: create view v_a as select a,volatilefunc(a) AS funcresult from a; select a from v_a; Since we didn't select funcresult, do we execute the function?
Re: [HACKERS] Allowing join removals for more join types
David Rowley dgrow...@gmail.com writes: On 9 July 2014 09:27, Tom Lane t...@sss.pgh.pa.us wrote: On review it looks like analyzejoins.c would possibly benefit from an earlier fast-path check as well. Do you mean for non-subqueries? There already is a check to see if the relation has no indexes. Oh, sorry, that was a typo: I meant to write pathnode.c. Specifically, we could skip the translate_sub_tlist step. Admittedly that's not hugely expensive, but as long as we have the infrastructure for a quick check it might be worth doing. TBH I find the checks for FOR UPDATE and volatile functions to be questionable as well. Well, that's a real tough one for me as I only added that based on what you told me here: I doubt you should drop a subquery containing FOR UPDATE, either. That's a side effect, just as much as a volatile function would be. Hah ;-). But the analogy to qual pushdown hadn't occurred to me at the time. As far as I know the FOR UPDATE check is pretty much void as of now anyway, since the current state of query_is_distinct_for() demands that there's either a DISTINCT, GROUP BY or just a plain old aggregate without any grouping, which will just return a single row, neither of these will allow FOR UPDATE anyway. True. So the effort here should be probably be more focused on if we should allow the join removal when the subquery contains volatile functions. We should probably think fairly hard on this now as I'm still planning on working on INNER JOIN removals at some point and I'm thinking we should likely be consistent between the 2 types of join for when it comes to FOR UPDATE and volatile functions, and I'm thinking right now that for INNER JOINs that, since they're INNER that we could remove either side of the join. In that case maybe it would be harder for the user to understand why their volatile function didn't get executed. Color me dubious. In exactly what circumstances would it be valid to suppress an inner join involving a sub-select? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allowing join removals for more join types
On Mon, Jul 7, 2014 at 4:15 AM, Tom Lane t...@sss.pgh.pa.us wrote: David Rowley dgrow...@gmail.com writes: On 6 July 2014 03:20, Tom Lane t...@sss.pgh.pa.us wrote: Just to note that I've started looking at this, and I've detected a rather significant omission: there's no check that the join operator has anything to do with the subquery's grouping operator. hmm, good point. If I understand this correctly we can just ensure that the same operator is used for both the grouping and the join condition. Well, that's sort of the zero-order solution, but it doesn't work if the join operators are cross-type. I poked around to see if we didn't have some code already for that, and soon found that not only do we have such code (equality_ops_are_compatible) but actually almost this entire patch duplicates logic that already exists in optimizer/util/pathnode.c, to wit create_unique_path's subroutines query_is_distinct_for et al. So I'm thinking what this needs to turn into is an exercise in refactoring to allow that logic to be used for both purposes. Well, it seems that might just reduce the patch size a little! I currently have this half hacked up to use query_is_distinct_for, but I see there's no code that allows Const's to exist in the join condition. I had allowed for this in groupinglist_is_unique_on_restrictinfo() and I tested it worked in a regression test (which now fails). I think to fix this, all it would take would be to modify query_is_distinct_for to take a list of Node's rather than a list of column numbers then just add some logic that skips if it's a Const and checks it as it does now if it's a Var Would you see a change of this kind a valid refactor for this patch? I notice that create_unique_path is not paying attention to the question of whether the subselect's tlist contains SRFs or volatile functions. It's possible that that's a pre-existing bug. *shrug*, perhaps the logic for that is best moved into query_is_distinct_for then? It might save a bug in the future too that way. Regards David Rowley
Re: [HACKERS] Allowing join removals for more join types
David Rowley dgrowle...@gmail.com writes: On Mon, Jul 7, 2014 at 4:15 AM, Tom Lane t...@sss.pgh.pa.us wrote: I poked around to see if we didn't have some code already for that, and soon found that not only do we have such code (equality_ops_are_compatible) but actually almost this entire patch duplicates logic that already exists in optimizer/util/pathnode.c, to wit create_unique_path's subroutines query_is_distinct_for et al. So I'm thinking what this needs to turn into is an exercise in refactoring to allow that logic to be used for both purposes. Well, it seems that might just reduce the patch size a little! I currently have this half hacked up to use query_is_distinct_for, but I see there's no code that allows Const's to exist in the join condition. I had allowed for this in groupinglist_is_unique_on_restrictinfo() and I tested it worked in a regression test (which now fails). I think to fix this, all it would take would be to modify query_is_distinct_for to take a list of Node's rather than a list of column numbers then just add some logic that skips if it's a Const and checks it as it does now if it's a Var Would you see a change of this kind a valid refactor for this patch? I'm a bit skeptical as to whether testing for that case is actually worth any extra complexity. Do you have a compelling use-case? But anyway, if we do want to allow it, why does it take any more than adding a check for Consts to the loops in query_is_distinct_for? It's the targetlist entries where we'd want to allow Consts, not the join conditions. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allowing join removals for more join types
David Rowley dgrow...@gmail.com writes: On 6 July 2014 03:20, Tom Lane t...@sss.pgh.pa.us wrote: Just to note that I've started looking at this, and I've detected a rather significant omission: there's no check that the join operator has anything to do with the subquery's grouping operator. hmm, good point. If I understand this correctly we can just ensure that the same operator is used for both the grouping and the join condition. Well, that's sort of the zero-order solution, but it doesn't work if the join operators are cross-type. I poked around to see if we didn't have some code already for that, and soon found that not only do we have such code (equality_ops_are_compatible) but actually almost this entire patch duplicates logic that already exists in optimizer/util/pathnode.c, to wit create_unique_path's subroutines query_is_distinct_for et al. So I'm thinking what this needs to turn into is an exercise in refactoring to allow that logic to be used for both purposes. I notice that create_unique_path is not paying attention to the question of whether the subselect's tlist contains SRFs or volatile functions. It's possible that that's a pre-existing bug. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allowing join removals for more join types
David Rowley dgrowle...@gmail.com writes: Attached is a delta patch between version 1.2 and 1.3, and also a completely updated patch. Just to note that I've started looking at this, and I've detected a rather significant omission: there's no check that the join operator has anything to do with the subquery's grouping operator. I think we need to verify that they are members of the same opclass, as relation_has_unique_index_for does. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allowing join removals for more join types
On 6 July 2014 03:20, Tom Lane t...@sss.pgh.pa.us wrote: David Rowley dgrowle...@gmail.com writes: Attached is a delta patch between version 1.2 and 1.3, and also a completely updated patch. Just to note that I've started looking at this, and I've detected a rather significant omission: there's no check that the join operator has anything to do with the subquery's grouping operator. I think we need to verify that they are members of the same opclass, as relation_has_unique_index_for does. hmm, good point. If I understand this correctly we can just ensure that the same operator is used for both the grouping and the join condition. I've attached a small delta patch which fixes this, and also attached the full updated patch. Regards David Rowley subquery_leftjoin_removal_v1.4.patch Description: Binary data subquery_leftjoin_removal_v1.4_delta.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allowing join removals for more join types
On Sun, Jun 22, 2014 at 11:51 PM, Simon Riggs si...@2ndquadrant.com wrote: On 17 June 2014 11:04, David Rowley dgrowle...@gmail.com wrote: On Wed, Jun 4, 2014 at 12:50 AM, Noah Misch n...@leadboat.com wrote: As a point of procedure, I recommend separating the semijoin support into its own patch. Your patch is already not small; delaying non-essential parts will make the essential parts more accessible to reviewers. In the attached patch I've removed all the SEMI and ANTI join removal code and left only support for LEFT JOIN removal of sub-queries that can be proved to be unique on the join condition by looking at the GROUP BY and DISTINCT clause. Good advice, we can come back for the others later. Example: SELECT t1.* FROM t1 LEFT OUTER JOIN (SELECT value,COUNT(*) FROM t2 GROUP BY value) t2 ON t1.id = t2.value; Looks good on initial look. This gets optimized... EXPLAIN (COSTS OFF) SELECT a.id FROM a LEFT JOIN (SELECT b.id,1 as dummy FROM b INNER JOIN c ON b.id = c.id GROUP BY b.id) b ON a.id = b.id AND b.dummy = 1; does it work with transitive closure like this.. EXPLAIN (COSTS OFF) SELECT a.id FROM a LEFT JOIN (SELECT b.id,1 as dummy FROM b INNER JOIN c ON b.id = c.id GROUP BY c.id) b ON a.id = b.id AND b.dummy = 1; i.e. c.id is not in the join, but we know from subselect that c.id = b.id and b.id is in the join Well, there's no code that looks at equivalence of the columns in the query, but I'm not quite sure if there would have to be or not as I can't quite think of a way to write that query in a valid way that would cause it not to remove the join. The example query will fail with: ERROR: column b.id must appear in the GROUP BY clause or be used in an aggregate function And if we rewrite it to use c.id in the target list EXPLAIN (COSTS OFF) SELECT a.id FROM a LEFT JOIN (SELECT c.id,1 as dummy FROM b INNER JOIN c ON b.id = c.id GROUP BY c.id) b ON a.id = b.id AND b.dummy = 1; With this one c.id becomes b.id, since we've given the subquery the alias 'b', so I don't think there's case here to optimise anything else. Regards David Rowley
Re: [HACKERS] Allowing join removals for more join types
On Wed, Jun 25, 2014 at 10:03 AM, Simon Riggs si...@2ndquadrant.com wrote: On 23 June 2014 12:06, David Rowley dgrow...@gmail.com wrote: It's not clear to me where you get the term sortclause from. This is either the groupclause or distinctclause, but in the test cases you provide this shows this has nothing at all to do with sorting since there is neither an order by or a sorted aggregate anywhere near those queries. Can we think of a better name that won't confuse us in the future? I probably got the word sort from the function targetIsInSortList, which expects a list of SortGroupClause. I've renamed the function to sortlist_is_unique_on_restrictinfo() and renamed the sortclause parameter to sortlist. Hopefully will reduce confusion about it being an ORDER BY clause a bit more. I think sortgroupclauselist might be just a bit too long. What do you think? OK, perhaps I should be clearer. The word sort here seems completely misplaced and we should be using a more accurately descriptive term. It's slightly more than editing to rename things like that, so I'd prefer you cam up with a better name. hmm, I do see what you mean and understand the concern, but I was a bit stuck on the fact it is a list of SortGroupClause after all. After a bit of looking around the source I found a function called grouping_is_sortable which seems to be getting given -groupClause and -distinctClause in a few places around the source. I've ended up naming the function groupinglist_is_unique_on_restrictinfo, but I can drop the word list off of that if that's any better? I did have it named clauselist_is_unique_on_restictinfo for a few minutes, but then I noticed that this was not very suitable since the calling function uses the variable name clause_list for the restrictinfo list, which made it even more confusing. Attached is a delta patch between version 1.2 and 1.3, and also a completely updated patch. Did you comment on the transitive closure question? Should we add a test for that, whether or not it works yet? In my previous email. I could change the the following to use c.id in the targetlist and group by clause, but I'm not really sure it's testing anything new or different. EXPLAIN (COSTS OFF) SELECT a.id FROM a LEFT JOIN (SELECT b.id,1 as dummy FROM b INNER JOIN c ON b.id = c.id GROUP BY b.id) b ON a.id = b.id AND b.dummy = 1; Regards David Rowley Other than that it looks pretty good to commit, so I'll wait a week for other objections then commit. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services subquery_leftjoin_removal_v1.3.patch Description: Binary data subquery_leftjoin_removal_v1.3_delta.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allowing join removals for more join types
On 26 June 2014 10:01, David Rowley dgrowle...@gmail.com wrote: Did you comment on the transitive closure question? Should we add a test for that, whether or not it works yet? In my previous email. I could change the the following to use c.id in the targetlist and group by clause, but I'm not really sure it's testing anything new or different. EXPLAIN (COSTS OFF) SELECT a.id FROM a LEFT JOIN (SELECT b.id,1 as dummy FROM b INNER JOIN c ON b.id = c.id GROUP BY b.id) b ON a.id = b.id AND b.dummy = 1; OK, agreed, no need to include. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allowing join removals for more join types
On 23 June 2014 12:06, David Rowley dgrow...@gmail.com wrote: It's not clear to me where you get the term sortclause from. This is either the groupclause or distinctclause, but in the test cases you provide this shows this has nothing at all to do with sorting since there is neither an order by or a sorted aggregate anywhere near those queries. Can we think of a better name that won't confuse us in the future? I probably got the word sort from the function targetIsInSortList, which expects a list of SortGroupClause. I've renamed the function to sortlist_is_unique_on_restrictinfo() and renamed the sortclause parameter to sortlist. Hopefully will reduce confusion about it being an ORDER BY clause a bit more. I think sortgroupclauselist might be just a bit too long. What do you think? OK, perhaps I should be clearer. The word sort here seems completely misplaced and we should be using a more accurately descriptive term. It's slightly more than editing to rename things like that, so I'd prefer you cam up with a better name. Did you comment on the transitive closure question? Should we add a test for that, whether or not it works yet? Other than that it looks pretty good to commit, so I'll wait a week for other objections then commit. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allowing join removals for more join types
Simon Riggs si...@2ndquadrant.com writes: Other than that it looks pretty good to commit, so I'll wait a week for other objections then commit. I'd like to review this before it goes in. I've been waiting for it to get marked ready for committer though. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allowing join removals for more join types
On 24 June 2014 23:48, Tom Lane t...@sss.pgh.pa.us wrote: Simon Riggs si...@2ndquadrant.com writes: Other than that it looks pretty good to commit, so I'll wait a week for other objections then commit. I'd like to review this before it goes in. I've been waiting for it to get marked ready for committer though. I'll leave it for you then once I'm happy. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allowing join removals for more join types
On 17 June 2014 11:04, David Rowley dgrowle...@gmail.com wrote: On Wed, Jun 4, 2014 at 12:50 AM, Noah Misch n...@leadboat.com wrote: As a point of procedure, I recommend separating the semijoin support into its own patch. Your patch is already not small; delaying non-essential parts will make the essential parts more accessible to reviewers. In the attached patch I've removed all the SEMI and ANTI join removal code and left only support for LEFT JOIN removal of sub-queries that can be proved to be unique on the join condition by looking at the GROUP BY and DISTINCT clause. Good advice, we can come back for the others later. Example: SELECT t1.* FROM t1 LEFT OUTER JOIN (SELECT value,COUNT(*) FROM t2 GROUP BY value) t2 ON t1.id = t2.value; Looks good on initial look. This gets optimized... EXPLAIN (COSTS OFF) SELECT a.id FROM a LEFT JOIN (SELECT b.id,1 as dummy FROM b INNER JOIN c ON b.id = c.id GROUP BY b.id) b ON a.id = b.id AND b.dummy = 1; does it work with transitive closure like this.. EXPLAIN (COSTS OFF) SELECT a.id FROM a LEFT JOIN (SELECT b.id,1 as dummy FROM b INNER JOIN c ON b.id = c.id GROUP BY c.id) b ON a.id = b.id AND b.dummy = 1; i.e. c.id is not in the join, but we know from subselect that c.id = b.id and b.id is in the join -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allowing join removals for more join types
On 22 June 2014 12:51, Simon Riggs si...@2ndquadrant.com wrote: Looks good on initial look. Tests 2 and 3 seem to test the same thing. There are no tests which have multiple column clauselist/sortlists, nor tests for cases where the clauselist is a superset of the sortlist. Test comments should refer to join removal rather than optimization because we may forget which optimization they are there to test. It's not clear to me where you get the term sortclause from. This is either the groupclause or distinctclause, but in the test cases you provide this shows this has nothing at all to do with sorting since there is neither an order by or a sorted aggregate anywhere near those queries. Can we think of a better name that won't confuse us in the future? The comment Since a constant only has 1 value the existence of one here will + * not cause any duplication of the results. We'll simply ignore it! would be better as We can ignore constants since they have only one value and don't affect uniqueness of results. The comment XXX is this comment still true?? can be removed since its just a discussion point. The comment beginning Currently, we only know how to remove left... has rewritten a whole block of text just to add a few words in the middle. We should rewrite the comment so it changes as few lines as possible. Especially when that comment is going to be changed again with your later patches. Better to have it provide a bullet point list of things we know how to remove, so we can just add to it later. Still looks good, other than the above. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allowing join removals for more join types
On Wed, Jun 4, 2014 at 12:50 AM, Noah Misch n...@leadboat.com wrote: As a point of procedure, I recommend separating the semijoin support into its own patch. Your patch is already not small; delaying non-essential parts will make the essential parts more accessible to reviewers. In the attached patch I've removed all the SEMI and ANTI join removal code and left only support for LEFT JOIN removal of sub-queries that can be proved to be unique on the join condition by looking at the GROUP BY and DISTINCT clause. Example: SELECT t1.* FROM t1 LEFT OUTER JOIN (SELECT value,COUNT(*) FROM t2 GROUP BY value) t2 ON t1.id = t2.value; Regards David Rowley subquery_leftjoin_removal_v1.1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allowing join removals for more join types
On Mon, Jun 2, 2014 at 11:42 AM, Tom Lane t...@sss.pgh.pa.us wrote: Stephen Frost sfr...@snowman.net writes: * Tom Lane (t...@sss.pgh.pa.us) wrote: TBH I think that trying to do anything at all for inner joins is probably a bad idea. The cases where the optimization could succeed are so narrow that it's unlikely to be worth adding cycles to every query to check. I agree that we don't want to add too many cycles to trivial queries but I don't think it's at all fair to say that FK-check joins are a narrow use-case and avoiding that join could be a very nice win. [ thinks for a bit... ] OK, I'd been thinking that to avoid a join the otherwise-unreferenced table would have to have a join column that is both unique and the referencing side of an FK to the other table's join column. But after consuming more caffeine I see I got that backwards and it would need to be the *referenced* side of the FK, which is indeed a whole lot more plausible case. Back when I did web development, this came up for me all the time. I'd create a fact table with lots of id columns referencing dimension tables, and then make a view over it that joined to all of those tables so that it was easy, when reporting, to select whatever bits of information were needed. But the problem was that if the report didn't need all the columns, it still had to pay the cost of computing them, which eventually got to be problematic. That was what inspired me to develop the patch for LEFT JOIN removal, but to really solve the problems I had back then, removing INNER joins as well would have been essential. So, while I do agree that we have to keep the planning cost under control, I'm quite positive about the general concept. I think a lot of users will benefit. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allowing join removals for more join types
On Fri, Jun 6, 2014 at 11:44 AM, Tom Lane t...@sss.pgh.pa.us wrote: Noah Misch n...@leadboat.com writes: On Thu, Jun 05, 2014 at 02:12:33AM +0200, Andres Freund wrote: A bit more crazy, but how about trying trying to plan joins with a added one-time qual that checks the size of the deferred trigger queue? Then we wouldn't even need special case plans. That, too, sounds promising to investigate. Not terribly. You can't actually do join removal in such a case, so it's not clear to me that there's much win to be had. The planner would be at a loss as to what cost to assign such a construct, either. Moreover, what happens if the trigger queue gets some entries after the query starts? In the scripts below I've created a scenario (scenario 1) that the inner query which I've put in a trigger function does see the the referenced table before the RI triggers execute, so it gives 1 row in the SELECT j2_id FROM j1 WHERE NOT EXISTS(SELECT 1 FROM j2 WHERE j2_id = j2.id) query. This works and I agree it's a problem that needs looked at in the patch. I'm also trying to create the situation that you describe where the RI trigger queue gets added to during the query. I'm likely doing it wrong somehow, but I can't see what I'm doing wrong. Here's both scripts. I need help with scenario 2 to create the problem you describe, I can't get my version to give me any stale non-cascaded records. -- Scenario 1: Outer command causes a foreign key trigger to be queued -- and this results in a window of time where we have records -- in the referencing table which don't yet exist in the -- referenced table. DROP TABLE IF EXISTS j1; DROP TABLE IF EXISTS j2; DROP TABLE IF EXISTS records_violating_fkey; CREATE TABLE j2 (id INT NOT NULL PRIMARY KEY); CREATE TABLE j1 ( id INT PRIMARY KEY, j2_id INT NOT NULL REFERENCES j2 (id) MATCH FULL ON DELETE CASCADE ON UPDATE CASCADE ); INSERT INTO j2 VALUES(10),(20); INSERT INTO j1 VALUES(1,10),(2,20); -- create a table to store records that 'violate' the fkey. CREATE TABLE records_violating_fkey (j2_id INT NOT NULL); CREATE OR REPLACE FUNCTION j1_update() RETURNS TRIGGER AS $$ BEGIN RAISE notice 'Trigger fired'; INSERT INTO records_violating_fkey SELECT j2_id FROM j1 WHERE NOT EXISTS(SELECT 1 FROM j2 WHERE j2_id = j2.id); RETURN NEW; END; $$ LANGUAGE plpgsql; CREATE TRIGGER j1_update_trigger BEFORE UPDATE ON j2 FOR EACH ROW EXECUTE PROCEDURE j1_update(); UPDATE j2 SET id = id+1; -- returns 1 row. SELECT * FROM records_violating_fkey; -- -- Scenario 2: Inner command causes a foreign key trigger to be queued. DROP TABLE IF EXISTS j1; DROP TABLE IF EXISTS j2; CREATE TABLE j2 (id INT NOT NULL PRIMARY KEY); CREATE TABLE j1 ( id INT PRIMARY KEY, j2_id INT NOT NULL REFERENCES j2 (id) MATCH FULL ON DELETE CASCADE ON UPDATE CASCADE ); INSERT INTO j2 VALUES(10),(20); INSERT INTO j1 VALUES(1,10),(2,20); CREATE OR REPLACE FUNCTION update_j2(p_id int) RETURNS int AS $$ BEGIN RAISE NOTICE 'Updating j2 id = % to %', p_id, p_id + 1; UPDATE j2 SET id = id + 1 WHERE id = p_id; RETURN 1; END; $$ LANGUAGE plpgsql; -- try and get some records to be returned by causing an update on the record that is not the current record. SELECT * FROM j1 WHERE NOT EXISTS(SELECT 1 FROM j2 WHERE j2_id = id) AND update_j2((SELECT MIN(j2_id) FROM j1 ij1 WHERE ij1.j2_id j1.j2_id)) = 1; Regards David Rowley
Re: [HACKERS] Allowing join removals for more join types
On Thu, Jun 05, 2014 at 02:12:33AM +0200, Andres Freund wrote: On 2014-06-04 20:04:07 -0400, Noah Misch wrote: On Wed, Jun 04, 2014 at 10:14:42AM -0400, Tom Lane wrote: It's possible that we could apply the optimization only to queries that have been issued directly by a client, but that seems rather ugly and surprise-filled. ... such as this idea. It's a good start to a fairly-hard problem. FKs are also always valid when afterTriggers-query_depth == -1, such as when all ongoing queries qualified for EXEC_FLAG_SKIP_TRIGGERS. What else? We could teach trigger.c to efficiently report whether a given table has a queued RI trigger. Having done that, when plancache.c is building a custom plan, the planner could ignore FKs with pending RI checks and use the rest. At that point, the surprises will be reasonably-isolated. A bit more crazy, but how about trying trying to plan joins with a added one-time qual that checks the size of the deferred trigger queue? Then we wouldn't even need special case plans. That, too, sounds promising to investigate. -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allowing join removals for more join types
Noah Misch n...@leadboat.com writes: On Thu, Jun 05, 2014 at 02:12:33AM +0200, Andres Freund wrote: A bit more crazy, but how about trying trying to plan joins with a added one-time qual that checks the size of the deferred trigger queue? Then we wouldn't even need special case plans. That, too, sounds promising to investigate. Not terribly. You can't actually do join removal in such a case, so it's not clear to me that there's much win to be had. The planner would be at a loss as to what cost to assign such a construct, either. Moreover, what happens if the trigger queue gets some entries after the query starts? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allowing join removals for more join types
On Thu, Jun 05, 2014 at 07:44:31PM -0400, Tom Lane wrote: Noah Misch n...@leadboat.com writes: On Thu, Jun 05, 2014 at 02:12:33AM +0200, Andres Freund wrote: A bit more crazy, but how about trying trying to plan joins with a added one-time qual that checks the size of the deferred trigger queue? Then we wouldn't even need special case plans. That, too, sounds promising to investigate. Not terribly. You can't actually do join removal in such a case, so it's not clear to me that there's much win to be had. The planner would be at a loss as to what cost to assign such a construct, either. Yes, those are noteworthy points against it. Moreover, what happens if the trigger queue gets some entries after the query starts? Normally, the query's snapshot will hide modifications that prompted those entries. Searching for exceptions to that rule should be part of this development effort. A related special case came to mind: queries running in the WHEN condition of an AFTER ROW trigger. If the trigger in question precedes the RI triggers, the queue will not yet evidence the triggering modification. Nonetheless, queries in the WHEN clause will see that modification. -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allowing join removals for more join types
On Wed, Jun 4, 2014 at 11:50 AM, Noah Misch n...@leadboat.com wrote: On Wed, May 28, 2014 at 08:39:32PM +1200, David Rowley wrote: The attached patch allows join removals for both sub queries with left joins and also semi joins where a foreign key can prove the existence of the record. When a snapshot can see modifications that queued referential integrity triggers for some FK constraint, that constraint is not guaranteed to hold within the snapshot until those triggers have fired. For example, a query running within a VOLATILE function f() in a statement like UPDATE t SET c = f(c) may read data that contradicts FK constraints involving table t. Deferred UNIQUE constraints, which we also do not yet use for deductions in the planner, have the same problem; see commit 0f39d50. This project will need a design accounting for that hazard. I remember reading about some concerns with that here: http://www.postgresql.org/message-id/51e2d505.5010...@2ndquadrant.com But I didn't quite understand the situation where the triggers are delayed. I just imagined that the triggers would have fired by the time the command had completed. If that's not the case, when do the triggers fire? on commit? Right now I've no idea how to check for this in order to disable the join removal. For the deferred unique constraints I'm protecting against that the same way as the left join removal does... It's in the relation_has_foreign_key_for() function where I'm matching the foreign keys up to the indexes on the other relation. As a point of procedure, I recommend separating the semijoin support into its own patch. Your patch is already not small; delaying non-essential parts will make the essential parts more accessible to reviewers. That's a good idea. I think the left join additions would be realistic to get in early in the 9.5 cycle, but the semi and anti joins stuff I know that I'm going to need some more advice for. It makes sense to split them out and get what I can in sooner rather than delaying it for no good reason. Regards David Rowley
Re: [HACKERS] Allowing join removals for more join types
David Rowley dgrowle...@gmail.com writes: On Wed, Jun 4, 2014 at 11:50 AM, Noah Misch n...@leadboat.com wrote: When a snapshot can see modifications that queued referential integrity triggers for some FK constraint, that constraint is not guaranteed to hold within the snapshot until those triggers have fired. I remember reading about some concerns with that here: http://www.postgresql.org/message-id/51e2d505.5010...@2ndquadrant.com But I didn't quite understand the situation where the triggers are delayed. I just imagined that the triggers would have fired by the time the command had completed. If that's not the case, when do the triggers fire? on commit? Right now I've no idea how to check for this in order to disable the join removal. I'm afraid that this point destroys your entire project :-( ... even without deferred constraints, there's no good way to be sure that you're not planning a query that will be run inside some function that can see the results of partially-completed updates. The equivalent problem for unique indexes is tolerable because the default choice is immediate uniqueness enforcement, but there's no similar behavior for FKs. It's possible that we could apply the optimization only to queries that have been issued directly by a client, but that seems rather ugly and surprise-filled. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allowing join removals for more join types
On Wed, Jun 04, 2014 at 10:14:42AM -0400, Tom Lane wrote: David Rowley dgrowle...@gmail.com writes: On Wed, Jun 4, 2014 at 11:50 AM, Noah Misch n...@leadboat.com wrote: When a snapshot can see modifications that queued referential integrity triggers for some FK constraint, that constraint is not guaranteed to hold within the snapshot until those triggers have fired. I remember reading about some concerns with that here: http://www.postgresql.org/message-id/51e2d505.5010...@2ndquadrant.com But I didn't quite understand the situation where the triggers are delayed. I just imagined that the triggers would have fired by the time the command had completed. If that's not the case, when do the triggers fire? on Normally, they fire in AfterTriggerEndQuery(), which falls at the end of a command. The trouble arises there when commands nest. (If the constraint is deferred, they fire just before COMMIT.) commit? Right now I've no idea how to check for this in order to disable the join removal. I'm afraid that this point destroys your entire project :-( ... even without deferred constraints, there's no good way to be sure that you're not planning a query that will be run inside some function that can see the results of partially-completed updates. The equivalent problem for unique indexes is tolerable because the default choice is immediate uniqueness enforcement, but there's no similar behavior for FKs. Let's not give up just yet. There's considerable middle ground between ignoring the hazard and ignoring all FK constraints in the planner, ... It's possible that we could apply the optimization only to queries that have been issued directly by a client, but that seems rather ugly and surprise-filled. ... such as this idea. It's a good start to a fairly-hard problem. FKs are also always valid when afterTriggers-query_depth == -1, such as when all ongoing queries qualified for EXEC_FLAG_SKIP_TRIGGERS. What else? We could teach trigger.c to efficiently report whether a given table has a queued RI trigger. Having done that, when plancache.c is building a custom plan, the planner could ignore FKs with pending RI checks and use the rest. At that point, the surprises will be reasonably-isolated. -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allowing join removals for more join types
On 2014-06-04 20:04:07 -0400, Noah Misch wrote: On Wed, Jun 04, 2014 at 10:14:42AM -0400, Tom Lane wrote: It's possible that we could apply the optimization only to queries that have been issued directly by a client, but that seems rather ugly and surprise-filled. ... such as this idea. It's a good start to a fairly-hard problem. FKs are also always valid when afterTriggers-query_depth == -1, such as when all ongoing queries qualified for EXEC_FLAG_SKIP_TRIGGERS. What else? We could teach trigger.c to efficiently report whether a given table has a queued RI trigger. Having done that, when plancache.c is building a custom plan, the planner could ignore FKs with pending RI checks and use the rest. At that point, the surprises will be reasonably-isolated. A bit more crazy, but how about trying trying to plan joins with a added one-time qual that checks the size of the deferred trigger queue? Then we wouldn't even need special case plans. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allowing join removals for more join types
On Wed, May 28, 2014 at 08:39:32PM +1200, David Rowley wrote: The attached patch allows join removals for both sub queries with left joins and also semi joins where a foreign key can prove the existence of the record. When a snapshot can see modifications that queued referential integrity triggers for some FK constraint, that constraint is not guaranteed to hold within the snapshot until those triggers have fired. For example, a query running within a VOLATILE function f() in a statement like UPDATE t SET c = f(c) may read data that contradicts FK constraints involving table t. Deferred UNIQUE constraints, which we also do not yet use for deductions in the planner, have the same problem; see commit 0f39d50. This project will need a design accounting for that hazard. As a point of procedure, I recommend separating the semijoin support into its own patch. Your patch is already not small; delaying non-essential parts will make the essential parts more accessible to reviewers. Thanks, nm -- Noah Misch EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allowing join removals for more join types
David Rowley dgrowle...@gmail.com writes: I'm not quite there with inner joins yet. I'm still getting my head around just where the join quals are actually stored. TBH I think that trying to do anything at all for inner joins is probably a bad idea. The cases where the optimization could succeed are so narrow that it's unlikely to be worth adding cycles to every query to check. The planning cost of all this is likely to be a concern anyway; but if you can show that you don't add anything unless there are some outer joins in the query, you can at least overcome objections about possibly adding significant overhead to trivial queries. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allowing join removals for more join types
* Tom Lane (t...@sss.pgh.pa.us) wrote: David Rowley dgrowle...@gmail.com writes: I'm not quite there with inner joins yet. I'm still getting my head around just where the join quals are actually stored. TBH I think that trying to do anything at all for inner joins is probably a bad idea. The cases where the optimization could succeed are so narrow that it's unlikely to be worth adding cycles to every query to check. I agree that we don't want to add too many cycles to trivial queries but I don't think it's at all fair to say that FK-check joins are a narrow use-case and avoiding that join could be a very nice win. The planning cost of all this is likely to be a concern anyway; but if you can show that you don't add anything unless there are some outer joins in the query, you can at least overcome objections about possibly adding significant overhead to trivial queries. I'm not quite buying this. We're already beyond really trivial queries since we're talking about joins and then considering how expensive joins can be, putting in a bit of effort to eliminate one would be time well worth spending, imv. In any case, I'd certainly suggest David continue to develop this and then we can look at measuring the cost on cases where it was time wasted and on cases where it helps. We may also be able to come up with ways to short-circuit the test and thereby minimize the cost in cases where it won't help. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] Allowing join removals for more join types
Stephen Frost sfr...@snowman.net writes: * Tom Lane (t...@sss.pgh.pa.us) wrote: TBH I think that trying to do anything at all for inner joins is probably a bad idea. The cases where the optimization could succeed are so narrow that it's unlikely to be worth adding cycles to every query to check. I agree that we don't want to add too many cycles to trivial queries but I don't think it's at all fair to say that FK-check joins are a narrow use-case and avoiding that join could be a very nice win. [ thinks for a bit... ] OK, I'd been thinking that to avoid a join the otherwise-unreferenced table would have to have a join column that is both unique and the referencing side of an FK to the other table's join column. But after consuming more caffeine I see I got that backwards and it would need to be the *referenced* side of the FK, which is indeed a whole lot more plausible case. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allowing join removals for more join types
On Sun, May 25, 2014 at 5:42 AM, Tom Lane t...@sss.pgh.pa.us wrote: David Rowley dgrowle...@gmail.com writes: I agree that there are not many cases left to remove the join that remain after is_simple_subquery() has decided not to pullup the subquery. Some of the perhaps more common cases would be having windowing functions in the subquery as this is what you need to do if you want to include the results of a windowing function from within the where clause. Another case, though I can't imagine it would be common, is ORDER BY in the subquery... But for that one I can't quite understand why is_simple_subquery() stops that being flattened in the first place. The problem there is that (in general) pushing qual conditions to below a window function will change the window function's results. If we flatten such a subquery then the outer query's quals can get evaluated before the window function, so that's no good. Another issue is that flattening might cause the window function call to get copied to places in the outer query where it can't legally go, such as the WHERE clause. I should have explained more clearly. I was meaning that a query such as this: SELECT a.* FROM a LEFT OUTER JOIN (SELECT id,LAG(id) OVER (ORDER BY id) AS prev_id FROM b) b ON a.id=b.id; assuming that id is the primary key, could have the join removed. I was just commenting on this as it's probably a fairly common thing to have a subquery with windowing functions in order to perform some sort of filtering of window function columns in the outer query. The other use cases for example: SELECT a.* FROM a LEFT OUTER JOIN (SELECT id FROM b LIMIT 10) b ON a.id=b.id ; Are likely less common. Regards David Rowley
Re: [HACKERS] Allowing join removals for more join types
David Rowley dgrowle...@gmail.com writes: I agree that there are not many cases left to remove the join that remain after is_simple_subquery() has decided not to pullup the subquery. Some of the perhaps more common cases would be having windowing functions in the subquery as this is what you need to do if you want to include the results of a windowing function from within the where clause. Another case, though I can't imagine it would be common, is ORDER BY in the subquery... But for that one I can't quite understand why is_simple_subquery() stops that being flattened in the first place. The problem there is that (in general) pushing qual conditions to below a window function will change the window function's results. If we flatten such a subquery then the outer query's quals can get evaluated before the window function, so that's no good. Another issue is that flattening might cause the window function call to get copied to places in the outer query where it can't legally go, such as the WHERE clause. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allowing join removals for more join types
On Mon, May 19, 2014 at 5:47 PM, Dilip kumar dilip.ku...@huawei.com wrote: So I think now when you are considering this join removal for subqueries then this can consider other case also like unique index inside subquery, because in attached patch unique index is considered only if its RTE_RELATION + if (innerrel-rtekind == RTE_RELATION + relation_has_unique_index_for(root, innerrel, clause_list, NIL, NIL)) return true; I've just had a bit of a look at implementing checks allowing subqueries with unique indexes on the join cols being removed, but I'm hitting a bit of a problem and I'm not quite sure if this is possible at this stage of planning. In the function join_is_removable() the variable innerrel is set to the RelOptInfo of the relation which we're checking if we can remove. In the case of removing subqueries the innerrel-rtekind will be RTE_SUBQUERY. I started going over the pre-conditions that the sub query will need to meet for this to be possible and the list so far looks something like: 1. Only a single base table referenced in the sub query. 2. No FOR UPDATE clause 3. No GROUP BY or DISTINCT clause 4. No set returning functions 5. no volatile functions. 6. has unique index that covers the join conditions or a subset of. I'm hitting a bit of a roadblock on point 1. Here's a snipped from my latest attempt: if (bms_membership(innerrel-relids) == BMS_SINGLETON) { int subqueryrelid = bms_singleton_member(innerrel-relids); RelOptInfo *subqueryrel = find_base_rel(innerrel-subroot, subqueryrelid); if (relation_has_unique_index_for(root, subqueryrel, clause_list, NIL, NIL)) return true; } But it seems that innerrel-subroot is still NULL at this stage of planning and from what I can tell does not exist anywhere else yet and is not generated until make_one_rel() is called from query_planner() Am I missing something major here,or does this sound about right? Regards David Rowley
Re: [HACKERS] Allowing join removals for more join types
On 23 May 2014 12:43 David Rowley Wrote, I'm hitting a bit of a roadblock on point 1. Here's a snipped from my latest attempt: if (bms_membership(innerrel-relids) == BMS_SINGLETON) { int subqueryrelid = bms_singleton_member(innerrel-relids); RelOptInfo *subqueryrel = find_base_rel(innerrel-subroot, subqueryrelid); if (relation_has_unique_index_for(root, subqueryrel, clause_list, NIL, NIL)) return true; } But it seems that innerrel-subroot is still NULL at this stage of planning and from what I can tell does not exist anywhere else yet and is not generated until make_one_rel() is called from query_planner() Am I missing something major here,or does this sound about right? It’s true that, till this point of time we haven’t prepared the base relation list for the subquery, and that will be done from make_one_rel while generating the SUBQURY path list. I can think of one solution but I think it will be messy… We get the base relation info directly from subquery Like currently in your patch (shown in below snippet) we are getting the distinct and groupby clause from sub Query, similarly we can get base relation info from (Query-jointree) if (innerrel-rtekind == RTE_SUBQUERY) { Query *query = root-simple_rte_array[innerrelid]-subquery; if (sortclause_is_unique_on_restrictinfo(query, clause_list, query-groupClause) || sortclause_is_unique_on_restrictinfo(query, clause_list, query-distinctClause)) return true; } Regards, Dilip
Re: [HACKERS] Allowing join removals for more join types
On Fri, May 23, 2014 at 8:28 PM, Dilip kumar dilip.ku...@huawei.com wrote: On 23 May 2014 12:43 David Rowley Wrote, I'm hitting a bit of a roadblock on point 1. Here's a snipped from my latest attempt: if (bms_membership(innerrel-relids) == BMS_SINGLETON) { int subqueryrelid = bms_singleton_member(innerrel-relids); RelOptInfo *subqueryrel = find_base_rel(innerrel-subroot, subqueryrelid); if (relation_has_unique_index_for(root, subqueryrel, clause_list, NIL, NIL)) return true; } But it seems that innerrel-subroot is still NULL at this stage of planning and from what I can tell does not exist anywhere else yet and is not generated until make_one_rel() is called from query_planner() Am I missing something major here,or does this sound about right? It’s true that, till this point of time we haven’t prepared the base relation list for the subquery, and that will be done from make_one_rel while generating the SUBQURY path list. I can think of one solution but I think it will be messy… We get the base relation info directly from subquery Like currently in your patch (shown in below snippet) we are getting the distinct and groupby clause from sub Query, similarly we can get base relation info from (Query-jointree) if (innerrel-rtekind == RTE_SUBQUERY) { Query *query = root-simple_rte_array[innerrelid]-subquery; if (sortclause_is_unique_on_restrictinfo(query, clause_list, query-groupClause) || sortclause_is_unique_on_restrictinfo(query, clause_list, query-distinctClause)) return true; } I'm getting the idea that this is just not the right place in planning to do this for subqueries. You seem to be right about the messy part too Here's a copy and paste of the kludge I've ended up with while testing this out: if (list_length(subquery-jointree-fromlist) == 1) { RangeTblEntry *base_rte; RelOptInfo *subqueryrelid; RangeTblRef *rtr = (RangeTblRef *) linitial(subquery-jointree-fromlist); if (!IsA(rtr, RangeTblRef)) return false; base_rte = rt_fetch(rtr-rtindex, subquery-rtable); if (base_rte-relkind != RTE_RELATION) return false; subqueryrelid = build_simple_rel(would have to fake this, rtr-rtindex, RELOPT_BASEREL); I don't have a PlannerInfo to pass to build_simple_rel and it just seems like a horrid hack to create one that we're not going to be keeping. Plus It would be a real shame to have to call build_simple_rel() for the same relation again when we plan the subquery later. I'm getting the idea that looking for unique indexes on the sub query is not worth the hassle for now. Don't get me wrong, they'd be nice to have, but I just think that it's a less common use case and these are more likely to have been pulled up anyway. Unless there's a better way, I think I'm going to spend the time looking into inner joins instead. Regards David Rowley Regards, Dilip
Re: [HACKERS] Allowing join removals for more join types
David Rowley dgrowle...@gmail.com writes: I've just had a bit of a look at implementing checks allowing subqueries with unique indexes on the join cols being removed, I'm a bit confused by this statement of the problem. I thought the idea was to recognize that subqueries with DISTINCT or GROUP BY clauses produce known-unique output column(s), which permits join removal in the same way that unique indexes on a base table allow us to deduce that certain columns are known-unique and hence can offer no more than one match for a join. That makes it primarily a syntactic check, which you can perform despite the fact that the subquery hasn't been planned yet (since the parser has done sufficient analysis to determine the semantics of DISTINCT/GROUP BY). Drilling down into the subquery is a whole different matter. For one thing, there's no point in targeting cases in which the subquery would be eligible to be flattened into the parent query, and your proposed list of restrictions seems to eliminate most cases in which it couldn't be flattened. For another, you don't have access to any planning results for the subquery yet, which is the immediate problem you're complaining of. Duplicating the work of looking up a relation's indexes seems like a pretty high price to pay for whatever improvement you might get here. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allowing join removals for more join types
On Sat, May 24, 2014 at 3:13 AM, Tom Lane t...@sss.pgh.pa.us wrote: David Rowley dgrowle...@gmail.com writes: I've just had a bit of a look at implementing checks allowing subqueries with unique indexes on the join cols being removed, I'm a bit confused by this statement of the problem. I thought the idea was to recognize that subqueries with DISTINCT or GROUP BY clauses produce known-unique output column(s), which permits join removal in the same way that unique indexes on a base table allow us to deduce that certain columns are known-unique and hence can offer no more than one match for a join. That makes it primarily a syntactic check, which you can perform despite the fact that the subquery hasn't been planned yet (since the parser has done sufficient analysis to determine the semantics of DISTINCT/GROUP BY). Up thread a little Dilip was talking about in addition to checking that if the sub query could be proved to be unique on the join condition using DISTINCT/GROUP BY, we might also check unique indexes in the subquery to see if they could prove the query is unique on the join condition. For example a query such as: SELECT a.* FROM a LEFT JOIN (SELECT b.* FROM b LIMIT 1) b ON a.column = b.colwithuniqueidx The presence of the LIMIT would be enough to stop the subquery being pulled up, but there'd be no reason to why the join couldn't be removed. I think the use case for this is likely a bit more narrow than the GROUP BY/DISTINCT case, so I'm planning on using the time on looking into more common cases such as INNER JOINs where we can prove the existence of the row using a foreign key. Drilling down into the subquery is a whole different matter. For one thing, there's no point in targeting cases in which the subquery would be eligible to be flattened into the parent query, and your proposed list of restrictions seems to eliminate most cases in which it couldn't be flattened. For another, you don't have access to any planning results for the subquery yet, which is the immediate problem you're complaining of. Duplicating the work of looking up a relation's indexes seems like a pretty high price to pay for whatever improvement you might get here. I agree that there are not many cases left to remove the join that remain after is_simple_subquery() has decided not to pullup the subquery. Some of the perhaps more common cases would be having windowing functions in the subquery as this is what you need to do if you want to include the results of a windowing function from within the where clause. Another case, though I can't imagine it would be common, is ORDER BY in the subquery... But for that one I can't quite understand why is_simple_subquery() stops that being flattened in the first place. Regards David Rowley
Re: [HACKERS] Allowing join removals for more join types
On Tue, May 20, 2014 at 11:22 PM, Tom Lane t...@sss.pgh.pa.us wrote: David Rowley dgrowle...@gmail.com writes: I'm also now wondering if I need to do some extra tests in the existing code to ensure that the subquery would have had no side affects. You should probably at least refuse the optimization if the subquery's tlist contains volatile functions. Functions that return sets might be problematic too [ experiments... ] Yeah, they are. This behavior is actually a bit odd: ... regression=# select q1,unnest(array[1,2]) as u from int8_tbl group by 1; q1| u --+--- 4567890123456789 | 1 4567890123456789 | 2 123 | 1 123 | 2 (4 rows) EXPLAIN shows that the reason the last case behaves like that is that the SRF is expanded *after* the grouping step. I'm not entirely sure if that's a bug --- given the lack of complaints, perhaps not. But it shows you can't apply this optimization without changing the existing behavior. I doubt you should drop a subquery containing FOR UPDATE, either. That's a side effect, just as much as a volatile function would be. regards, tom lane Yeah that is strange indeed. I've made some updates to the patch to add some extra checks for any volatile functions in the target list and set returning functions. The FOR UPDATE currently does not really need an explicit check as I'm currently only supporting removals of sub queries that have either GROUP BY or DISTINCT clauses, none of which allow FOR UPDATE anyway. Regards David Rowley subquery_leftjoin_removal_v0.6.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allowing join removals for more join types
On Mon, May 19, 2014 at 9:22 PM, Dilip kumar dilip.ku...@huawei.com wrote: On 19 May 2014 12:15 David Rowley Wrote, May be we can convert my above example like below à in this case we have unique index on field a and we are limiting it by first 100 tuple (record are already order because of index) Create table t1 (a int, b int); Create table t2 (a int, b int); Create unique index on t2(a); create view v1 as select x.a, y.b from t1 x left join (select t2.a a1, b from t2 limit 100) as y on x.a=y.a1; select a from v1; à for this query I think left join can be removed, But in view since non join field(b) is also projected so this cannot be simplified there. Ok I see what you mean. I guess then that if we did that then we should also support removals of join in subqueries of subqueries. e.g: select t1.* from t1 left join (select t2.uniquecol from (select t2.uniquecol from t2 limit 1000) t2 limit 100) t2 on t1.id = t2.uniquecol On my first round of thoughts on this I thought that we could keep looking into the sub queries until we find that the sub query only queries a single table or it is not a base relation. If we find one with a single table and the sub query has no distinct or group bys then I thought we could just look at the unique indexes similar to how it's done now for a direct table join. But after giving this more thought, I'm not quite sure if a lack of DISTINCT and GROUP BY clause is enough for us to permit removing the join. Would it matter if the sub query did a FOR UPDATE? I started looking at is_simple_subquery() in prepjointree.c but if all those conditions were met then the subquery would have been pulled up to a direct join anyway. I'm also now wondering if I need to do some extra tests in the existing code to ensure that the subquery would have had no side affects. For example: SELECT t1.* FROM t1 LEFT OUTER JOIN (SELECT id,some_function_that_does_something(id) FROM t2 GROUP BY id) t2 ON t1.id = t2.id; Regards David Rowley
Re: [HACKERS] Allowing join removals for more join types
David Rowley dgrowle...@gmail.com writes: I'm also now wondering if I need to do some extra tests in the existing code to ensure that the subquery would have had no side affects. You should probably at least refuse the optimization if the subquery's tlist contains volatile functions. Functions that return sets might be problematic too [ experiments... ] Yeah, they are. This behavior is actually a bit odd: regression=# select q1 from int8_tbl; q1 -- 123 123 4567890123456789 4567890123456789 4567890123456789 (5 rows) regression=# select q1 from int8_tbl group by 1; q1 -- 4567890123456789 123 (2 rows) regression=# select q1,unnest(array[1,2]) as u from int8_tbl; q1| u --+--- 123 | 1 123 | 2 123 | 1 123 | 2 4567890123456789 | 1 4567890123456789 | 2 4567890123456789 | 1 4567890123456789 | 2 4567890123456789 | 1 4567890123456789 | 2 (10 rows) regression=# select q1,unnest(array[1,2]) as u from int8_tbl group by 1; q1| u --+--- 4567890123456789 | 1 4567890123456789 | 2 123 | 1 123 | 2 (4 rows) EXPLAIN shows that the reason the last case behaves like that is that the SRF is expanded *after* the grouping step. I'm not entirely sure if that's a bug --- given the lack of complaints, perhaps not. But it shows you can't apply this optimization without changing the existing behavior. I doubt you should drop a subquery containing FOR UPDATE, either. That's a side effect, just as much as a volatile function would be. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allowing join removals for more join types
On Mon, May 19, 2014 at 5:47 PM, Dilip kumar dilip.ku...@huawei.com wrote: On 18 May 2014 16:38 David Rowley Wrote Sound like a good idea to me.. I have one doubt regarding the implementation, consider the below query Create table t1 (a int, b int); Create table t2 (a int, b int); Create unique index on t2(b); select x.a from *t1 x* left join (select *distinct t2.a a1*, *t2.b b1*from t2) as y on x.a=y.b1; (*because of distinct clause subquery will not be pulled up*) In this case, Distinct clause is used on *t2.a, *but* t2.b *is used for left Join (t2.b have unique index so this left join can be removed). So I think now when you are considering this join removal for subqueries then this can consider other case also like unique index inside subquery, because in attached patch unique index is considered only if its RTE_RELATION + if (innerrel-rtekind == RTE_RELATION + relation_has_unique_index_for(root, innerrel, clause_list, NIL, NIL)) return true; Correct me if I am missing something.. I think you are right here, it would be correct to remove that join, but I also think that the query in question could be quite easily be written as: select t1.a from t1 left join t2 on t1.a=t2.b; Where the join WILL be removed. The distinct clause here technically is a no-op due to all the columns of a unique index being present in the clause. Can you think of a use case for this where the sub query couldn't have been written out as a direct join to the relation? What would be the reason to make it a sub query with the distinct? or have I gotten something wrong here? I'm also thinking here that if we made the join removal code remove these joins, then the join removal code would end up smarter than the rest of the code as the current code seems not to remove the distinct clause for single table queries where a subset of the columns of a distinct clause match all the columns of a unique index. create table pktest (id int primary key); explain (costs off) select distinct id from pktest; QUERY PLAN -- HashAggregate Group Key: id - Seq Scan on pktest This could have been rewritten to become: select id from pktest I feel if we did that sort of optimisation to the join removals, then I'd guess we'd better put it into other parts of the code too, perhaps something that could do this should be in the re-writer then once the join removal code gets to it, the join could be removed. Can you think of a similar example where the subquery could not have been written as a direct join to the relation? Regards David Rowley
Re: [HACKERS] Allowing join removals for more join types
On 19 May 2014 12:15 David Rowley Wrote, I think you are right here, it would be correct to remove that join, but I also think that the query in question could be quite easily be written as: select t1.a from t1 left join t2 on t1.a=t2.b; Where the join WILL be removed. The distinct clause here technically is a no-op due to all the columns of a unique index being present in the clause. Can you think of a use case for this where the sub query couldn't have been written out as a direct join to the relation? What would be the reason to make it a sub query with the distinct? or have I gotten something wrong here? I'm also thinking here that if we made the join removal code remove these joins, then the join removal code would end up smarter than the rest of the code as the current code seems not to remove the distinct clause for single table queries where a subset of the columns of a distinct clause match all the columns of a unique index. Can you think of a similar example where the subquery could not have been written as a direct join to the relation? I think, you are write that above given query and be written in very simple join. But what my point is, In any case when optimizer cannot pull up the subquery (because it may have aggregate, group by, order by, limit, distinct etc.. clause), That time even, It will check Whether join is removable or not only when distinct or group by clause is there if it has unique index then it will not be check, is there no scenario where it will be useful ? May be we can convert my above example like below -- in this case we have unique index on field a and we are limiting it by first 100 tuple (record are already order because of index) Create table t1 (a int, b int); Create table t2 (a int, b int); Create unique index on t2(a); create view v1 as select x.a, y.b from t1 x left join (select t2.a a1, b from t2 limit 100) as y on x.a=y.a1; select a from v1; -- for this query I think left join can be removed, But in view since non join field(b) is also projected so this cannot be simplified there. In your patch, anyway we are having check for distinct and group clause inside subquery, can’t we have check for unique index also ? Regards, Dilip
Re: [HACKERS] Allowing join removals for more join types
On Sat, May 17, 2014 at 8:57 PM, David Rowley dgrowle...@gmail.com wrote: I'm currently in the early stages of looking into expanding join removals. Currently left outer joins can be removed if none of the columns of the table are required for anything and the table being joined is a base table that contains a unique index on all columns in the join clause. The case I would like to work on is to allow sub queries where the query is grouped by or distinct on all of the join columns. Take the following as an example: CREATE TABLE products (productid integer NOT NULL, code character varying(32) NOT NULL); CREATE TABLE sales (saleid integer NOT NULL, productid integer NOT NULL, qty integer NOT NULL); CREATE VIEW product_sales AS SELECT p.productid, p.code, s.qty FROM (products p LEFT JOIN ( SELECT sales.productid, sum(sales.qty) AS qty FROM sales GROUP BY sales.productid) s ON ((p.productid = s.productid))); If a user does: SELECT productid,code FROM product_sales; Then, if I'm correct, the join on sales can be removed. Attached is a patch which implements this. It's still a bit rough around the edges and some names could likely do with being improved, but it at least seems to work with all of the test cases that I've thrown at it so far. Comments are welcome, but the main purpose of the email is so I can register the patch for the June commitfest. Regards David Rowley subquery_leftjoin_removal_v0.5.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allowing join removals for more join types
On 18 May 2014 16:38 David Rowley Wrote Sound like a good idea to me.. I have one doubt regarding the implementation, consider the below query Create table t1 (a int, b int); Create table t2 (a int, b int); Create unique index on t2(b); select x.a from t1 x left join (select distinct t2.a a1, t2.b b1 from t2) as y on x.a=y.b1; (because of distinct clause subquery will not be pulled up) In this case, Distinct clause is used on t2.a, but t2.b is used for left Join (t2.b have unique index so this left join can be removed). So I think now when you are considering this join removal for subqueries then this can consider other case also like unique index inside subquery, because in attached patch unique index is considered only if its RTE_RELATION + if (innerrel-rtekind == RTE_RELATION + relation_has_unique_index_for(root, innerrel, clause_list, NIL, NIL)) return true; Correct me if I am missing something.. CREATE TABLE products (productid integer NOT NULL, code character varying(32) NOT NULL); CREATE TABLE sales (saleid integer NOT NULL, productid integer NOT NULL, qty integer NOT NULL); CREATE VIEW product_sales AS SELECT p.productid, p.code, s.qty FROM (products p LEFT JOIN ( SELECT sales.productid, sum(sales.qty) AS qty FROM sales GROUP BY sales.productid) s ON ((p.productid = s.productid))); If a user does: SELECT productid,code FROM product_sales; Then, if I'm correct, the join on sales can be removed. Attached is a patch which implements this. It's still a bit rough around the edges and some names could likely do with being improved, but it at least seems to work with all of the test cases that I've thrown at it so far. Comments are welcome, but the main purpose of the email is so I can register the patch for the June commitfest.
Re: [HACKERS] Allowing join removals for more join types
On Sat, May 17, 2014 at 8:57 PM, David Rowley dgrowle...@gmail.com wrote: I'm currently in the early stages of looking into expanding join removals. As I said above, I'm in the early stages of looking at this and I'm currently a bit confused. Basically I've put a breakpoint at the top of the join_is_removable function and I can see that innerrel-rtekind is RTE_SUBQUERY for my query, so the function is going to return false. So what I need to so is get access to innerrel-subroot-parse so that I can look at groupClause and distinctClause. The thing is innerrel-subroot is NULL in this case, but I see a comment for subroot saying subroot - PlannerInfo for subquery (NULL if it's not a subquery) so I guess this does not also mean subroot - PlannerInfo for subquery (NOT NULL if it's a subquery)? Has anyone got any pointers to where I might be able to get the Query details for the subquery? These structures are quite new to me. I think I've managed to answer my own question here. But please someone correct me if this sounds wrong. It looks like the existing join removals are done quite early in the planning and redundant joins are removed before any subqueries from that query are planned. So this innerrel-subroot-parse has not been done yet. It seems to be done later in query_planner() when make_one_rel() is called. The best I can come up with on how to implement this is to have 2 stages of join removals. Stage 1 would be the existing stage that attempts to remove joins from non subqueries. Stage 2 would happen just after make_one_rel() is called from query_planner(), this would be to attempt to remove any subqueries that are not need, and if it managed to remove any it would force a 2nd call to make_one_rel(). Does this sound reasonable or does it sound like complete non-sense? Regards David Rowley
Re: [HACKERS] Allowing join removals for more join types
David Rowley dgrowle...@gmail.com writes: It looks like the existing join removals are done quite early in the planning and redundant joins are removed before any subqueries from that query are planned. So this innerrel-subroot-parse has not been done yet. It seems to be done later in query_planner() when make_one_rel() is called. It's true that we don't plan the subquery till later, but I don't see why that's important here. Everything you want to know is available from the subquery parsetree; so just look at the RTE, don't worry about how much of the RelOptInfo has been filled in. The best I can come up with on how to implement this is to have 2 stages of join removals. Stage 1 would be the existing stage that attempts to remove joins from non subqueries. Stage 2 would happen just after make_one_rel() is called from query_planner(), this would be to attempt to remove any subqueries that are not need, and if it managed to remove any it would force a 2nd call to make_one_rel(). That sounds like a seriously bad idea. For one thing, it blows the opportunity to not plan the subquery in the first place. For another, most of these steps happen in a carefully chosen order because there are interdependencies. You can't just go back and re-run some earlier processing step. A large fraction of the complexity of analyzejoins.c right now arises from the fact that it has to undo some earlier processing; that would get enormously worse if you delayed it further. BTW, just taking one step back ... this seems like a pretty specialized requirement. Are you sure it wouldn't be easier to fix your app to not generate such silly queries? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Allowing join removals for more join types
On Sun, May 18, 2014 at 2:55 AM, Tom Lane t...@sss.pgh.pa.us wrote: David Rowley dgrowle...@gmail.com writes: It looks like the existing join removals are done quite early in the planning and redundant joins are removed before any subqueries from that query are planned. So this innerrel-subroot-parse has not been done yet. It seems to be done later in query_planner() when make_one_rel() is called. It's true that we don't plan the subquery till later, but I don't see why that's important here. Everything you want to know is available from the subquery parsetree; so just look at the RTE, don't worry about how much of the RelOptInfo has been filled in. Thanks. I think I've found what you're talking about in PlannerInfo simple_rte_array. That's got the ball rolling. The best I can come up with on how to implement this is to have 2 stages of join removals. Stage 1 would be the existing stage that attempts to remove joins from non subqueries. Stage 2 would happen just after make_one_rel() is called from query_planner(), this would be to attempt to remove any subqueries that are not need, and if it managed to remove any it would force a 2nd call to make_one_rel(). That sounds like a seriously bad idea. For one thing, it blows the opportunity to not plan the subquery in the first place. For another, most of these steps happen in a carefully chosen order because there are interdependencies. You can't just go back and re-run some earlier processing step. A large fraction of the complexity of analyzejoins.c right now arises from the fact that it has to undo some earlier processing; that would get enormously worse if you delayed it further. Agreed, but at the time I didn't know that the subquery information was available elsewhere. BTW, just taking one step back ... this seems like a pretty specialized requirement. Are you sure it wouldn't be easier to fix your app to not generate such silly queries? Well, couldn't you say the same about any join removals? Of course the query could be rewritten to not reference that relation, but there are cases where removing the redundant join might be more silly, for example a fairly complex view may exist and many use cases for the view don't require all of the columns. It might be a bit of a pain to maintain a whole series of views with each required subset of columns instead of just maintaining a single view and allow callers to use what they need from it. I came across the need for this at work this week where we have a grid in a UI where the users can select columns that they need to see in the grid. The data in each grid is supplied by a single view which contains all the possible columns that they might need, if the user is just using a narrow subset of those columns then it could seem quite wasteful to do more than is required. Regards David Rowley