On Sat, Aug 11, 2018 at 9:16 AM, David Rowley <david.row...@2ndquadrant.com> wrote: > I started debugging this to see where things go wrong. I discovered > that add_paths_to_append_rel() is called yet again from > apply_scanjoin_target_to_paths() and this is where it's all going > wrong. The problem is that the gather paths have been tagged onto the > partial paths by this time, so accumulate_append_subpath() has no code > to look through those to fetch the Append/MergeAppend subpaths, so it > just appends the entire path to the subpaths List. This all worked > before 96030f9a481. This commit moved where generate_gather_paths() is > called.
I'm baffled as to why looking through Gather to find Append/MergeAppend subpaths would ever be a sane thing to do. > I'm having a hard time understanding why we need to call > add_paths_to_append_rel() from apply_scanjoin_target_to_paths(). The > name of the function does not seem to imply that we'd do this. The > comment just explains "what" rather than "why" making it a bit > useless. > > /* Build new paths for this relation by appending child paths. */ > if (live_children != NIL) > add_paths_to_append_rel(root, rel, live_children); > > I also think that accumulate_append_subpath() is a bit of a fragile > way of flattening the append rel's paths, but I feel it's probably > apply_scanjoin_target_to_paths() that's at fault here. In my original design, apply_scanjoin_target_to_paths() -- or whatever I called it in the original patch versions -- built an entirely new RelOptInfo, so that we ended up with one RelOptInfo representing the unvarnished result of scan/join planning, and a second one representing the result of applying the scan/join target to that result. However, Amit Kapila was able to demonstrate that this caused a small but noticeable regression in planning speed, which seemed like it might plausibly cause some issues for people running very simple queries. In that original design, if the topmost scan/join rel was partitioned, the new "tlist" upper rel was also partitioned, and in the same way. In short, this was a kind of "partitionwise tlist" feature. For each child of the topmost scan/join rel, we built a child "tlist" rel, which got the same paths but with the correct tlist applied to each one. The path for the toplevel "tlist" upper rel could then be built by appending the children from each child "tlist" rel, or else by the usual method of sticking a Result node over the cheapest path for the topmost scan/join rel. In general, the first method is superior. Instead of a plan like Result -> Append -> (N children each producing the unprojected tlist), you get Append -> (N children each producing the projected tlist), which is cheaper. To avoid the performance overhead of creating a whole new tree of upper rels, I rewrote the code so that it modifies the RelOptInfos in place. That unfortunately makes the logic a bit more confusing, and it sounds from your remarks like I didn't comment it as clearly as might have been possible. But the basic idea is the same: we want the projection to be passed down to the child nodes rather than getting a Result node on top. The commit that did most of this -- 11cf92f6e2e13c0a6e3f98be3e629e6bd90b74d5 -- also solved a few other problems, as noted in the commit message, so I don't think we want to rip it out wholesale. There might be better ways to do some of it, though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company