On Fri, Mar 13, 2015 at 2:31 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Robert Haas <robertmh...@gmail.com> writes: >> Another bit of this that I think we could commit without fretting >> about it too much is the code adding set_join_pathlist_hook. This is >> - I think - analogous to set_rel_pathlist_hook, and like that hook, >> could be used for other purposes than custom plan generation - e.g. to >> delete paths we do not want to use. I've extracted this portion of >> the patch and adjusted the comments; if there are no objections, I >> will commit this bit also. > > I don't object to the concept, but I think that is a pretty bad place > to put the hook call: add_paths_to_joinrel is typically called multiple > (perhaps *many*) times per joinrel and thus this placement would force > any user of the hook to do a lot of repetitive work.
Interesting point. I guess the question is whether a some or all callers are going to actually *want* a separate call for each invocation of add_paths_to_joinrel(), or whether they'll be happy to operate on the otherwise-complete path list. It's true that if your goal is to delete paths, it's probably best to be called just once after the path list is complete, and there might be a use case for that, but I guess it's less useful than for baserels. For a baserel, as long as you don't nuke the sequential-scan path, there is always going to be a way to complete the plan; so this would be a fine way to implement a disable-an-index extension. But for joinrels, it's not so easy to rule out, say, a hash-join here. Neither hook placement is much good for that; the path you want to get rid of may have already dominated paths you want to keep. Suppose you want to add paths - e.g. you have an extension that goes and looks for a materialized view that matches this subtree of the query, and if it finds one, it substitutes a scan of the materialized view for a scan of the baserel. Or, as in KaiGai's case, you have an extension that can perform the whole join in GPU-land and produce the same results we would have gotten via normal execution. Either way, you want - and this is the central point of the whole patch here - to inject a scan path into a joinrel. It is not altogether obvious to me what the best placement for this is. In the materialized view case, you probably need a perfect match between the baserels in the view and the baserels in the joinrel to do anything. There's no point in re-checking that for every innerrels/outerrels combination. I don't know enough about the GPU case to reason about it intelligently; maybe KaiGai can comment. I think the foreign data wrapper join pushdown case, which also aims to substitute a scan for a join, is interesting to think about, even though it's likely to be handled by a new FDW method instead of via the hook. Where should the FDW method get called from? Currently, the FDW method in KaiGai's patch is GetForeignJoinPaths, and that gets called from add_paths_to_joinrel(). The patch at http://www.postgresql.org/message-id/CAEZqfEfy7p=urpwn-q-nngzb8kwhbfqf82ysb9ztfzg7zn6...@mail.gmail.com uses that to implement join pushdown in postgres_fdw; if you have A JOIN B JOIN C all on server X, we'll notice that the join with A and B can be turned into a foreign scan on A JOIN B, and similarly for A-C and B-C. Then, if it turns out that the cheapest path for A-B is the foreign join, and the cheapest path for C is a foreign scan, we'll arrive at the idea of a foreign scan on A-B-C, and we'll realize the same thing in each of the other combinations as well. So, eventually the foreign join gets pushed down. But there's another possible approach: suppose that join_search_one_level, after considering left-sided and right-sided joins and after considering bushy joins, checks whether every relation it's got is from the same foreign server, and if so, asks that foreign server whether it would like to contribute any paths. Would that be better or worse? A disadvantage is that if you've got something like A LEFT JOIN B LEFT JOIN C LEFT JOIN D LEFT JOIN E LEFT JOIN F LEFT JOIN G LEFT JOIN H LEFT JOIN I but none of the joins can be pushed down (say, each join clause calls a non-pushdown-safe function) you'll end up examining a pile of joinrels - at every level of the join tree - and individually rejecting each one. With the build-it-up-incrementally approach, you'll figure that all out at level 2, and then after that there's nothing to do but give up quickly. On the other hand, I'm afraid the incremental approach might miss a trick: consider small LEFT JOIN (big INNER JOIN huge ON big.x = huge.x) ON small.y = big.y AND small.z = huge.z, where all three are foreign tables on the same server. If the output of the big/huge join is big, none of those paths are going to survive at level 2, but the overall join size might be very small, so we surely want a chance to recover at level 3. (We discussed test cases of this form quite a bit in the context of e2fa76d80ba571d4de8992de6386536867250474.) Thoughts? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers