On Sun, Feb 28, 2016 at 3:03 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > I'm pretty pleased with the way this turned out. grouping_planner() is > about half the length it was before, and much more straightforward IMO. > planagg.c no longer seems like a complete hack; it's a reasonable > prototype for injecting nontraditional implementation paths into > aggregation or other late planner stages, and grouping_planner() doesn't > need to know about it.
Thanks for working on this. Some review comments: - I think all of the new path creation functions should bubble up parallel_degree from their subpath. It's fine for LockRows to use 0 for now, because we have no hope of supporting write operations in parallel mode any time soon, but it's one less thing to change later. The others should really all use the subpath's parallel degree. Actually, they should use the subpath's parallel degree or maybe a larger number if they add a lot of CPU work to the query, but don't have any principled way to model that right now, so just copying the value is probably as good as we're going to do for the moment. - RollupPath seems like a poor choice of name, if nothing else. You would expect that it would be related to GROUP BY ROLLUP() but of course that's really the same thing as GROUP BY GROUPING SETS () or GROUP BY CUBE (), and the fundamental operation is actually GROUPING SETS, not ROLLUP. - It's not entirely related to this patch, but I'm starting to wonder if we've made the wrong bet about target lists. It seems to me that there's a huge difference between a projection which simply throws away columns we don't need and one which actually computes something, and maybe those cases ought to be treated differently instead of saying "well, it's a target list". It strikes me that there are probably execution-time optimizations that are possible in the former case, and maybe a more compact representation of the projection operation as well. I can't shake the feeling that our extensive use of lists can't be the best thing ever for performance. - A related point that is more connected to this patch is that you've added 13 (!) new calls to disuse_physical_tlist, and 8 of those are marked with /* XXX hack: need original tlist with sortgroupref marking */. I don't quite understand what's going on there. I think if we're going to be stuck with that hack we at least need some comments explaining what is going on there. What has caused us to suddenly need these calls when we didn't before, and why these places and not some others? - For SortPath, you mention that the SortGroupClauses representation isn't currently used. It's not clear to me that it ever would be; what would be the case where that might be useful? At any rate, I'd be inclined to rip it out for now; you can always put it back later. - create_distinct_paths() disables the hash path if it seems like it would exceed work_mem, unless the sort path isn't viable. But there's something that feels a bit uncomfortable about this. Suppose the sort path isn't viable but some other kind of future path is viable. It seems like it would be better to restructure this a bit so that the decision about whether to add the hash path is based on whether there are any other paths in the rel when we reach the bottom of the function. create_grouping_paths() has a similar issue. In general, and I'm sure this is not a huge surprise, most of this looks very good to me. I think the design is sound and that, if the performance is OK, we ought to move forward with it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers