Re: [HACKERS] WIP: Upper planner pathification
Robert Haas writes: > On Thu, Mar 17, 2016 at 2:31 PM, Tom Lane wrote: >> So what I would now propose is >> >> typedef void (*create_upper_paths_hook_type) (PlannerInfo *root, >> UpperRelationKind stage, >> RelOptInfo *input_rel); >> >> and have this invoked at each stage right before we call >> create_grouping_paths, create_window_paths, etc. > Works for me. Now that the commitfest crunch is over, I went back and took care of this loose end. I ended up passing the output_rel as well for each step, to save hook users from having to look that up for themselves. The hook calls are placed immediately before set_cheapest() for each upper rel, so that all core-built Paths are available for inspection. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
> -Original Message- > From: pgsql-hackers-ow...@postgresql.org > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Kouhei Kaigai > Sent: Saturday, March 19, 2016 8:57 AM > To: Tom Lane > Cc: Robert Haas; Petr Jelinek; David Rowley; pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] WIP: Upper planner pathification > > > -Original Message- > > From: pgsql-hackers-ow...@postgresql.org > > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Tom Lane > > Sent: Friday, March 18, 2016 11:44 PM > > To: Kaigai Kouhei(海外 浩平) > > Cc: Robert Haas; Petr Jelinek; David Rowley; pgsql-hackers@postgresql.org > > Subject: Re: [HACKERS] WIP: Upper planner pathification > > > > Kouhei Kaigai writes: > > > On Wed, Mar 16, 2016 at 2:47 PM, Tom Lane wrote: > > >> I do not, however, like the proposal to expose wflists and so forth. > > >> Those are internal data structures in grouping_planner and I absolutely > > >> refuse to promise that they're going to stay stable. > > > > > Hmm... It's not easy to imagine a case that extension wants own idea > > > to extract window functions from the target list and select active > > > windows, even if extension wants to have own executor and own cost > > > estimation logic. > > > In case when extension tries to add WindowPath + CustomPath(Sort), > > > extension is interested in alternative sort task, but not window > > > function itself. It is natural to follow the built-in implementation, > > > thus, it motivates extension author to take copy & paste the code. > > > select_active_windows() is static, so extension needs to have same > > > routine on their side. > > > > Well, to be perfectly blunt about it, I have said from day one that this > > notion that a CustomScan extension will be able to cause arbitrary planner > > behavior changes is loony. We are simply not going to drop a hook into > > every tenth line of the planner for you, nor de-static-ify every internal > > function, nor (almost equivalently) expose the data structures those > > functions produce, because it would cripple core planner development to > > try to keep the implied APIs stable. And I continue to maintain that any > > actually-generally-useful ideas would be better handled by submitting them > > as patches to the core planner, rather than trying to implement them in an > > arms-length extension. > > > > In the case at hand, I notice that the WindowFuncLists struct is > > actually from find_window_functions in clauses.c, so an extension > > that needed to get hold of that would be unlikely to do any copying > > and pasting anyhow -- it'd just call find_window_functions again. > > (That only needs to search the targetlist, so it's not that expensive.) > > The other lists you mention are all tightly tied to specific, and not > > terribly well-designed, implementation strategies for grouping sets and > > window functions. Those are *very* likely to change in the near future; > > and even if they don't, I'm skeptical that an external implementor of > > grouping sets or window functions would want to use exactly those same > > implementation strategies. Maybe the only reason you're there at all > > is you want to be smarter about the order of doing window functions, > > for example. > > > > So I really don't want to export any of that stuff. > > > Hmm. I could understand we have active development around this area > thus miscellaneous internal data structure may not be enough stable > to expose the extension. > Don't you deny recall the discussion once implementation gets calmed > down on the future development cycle? > > > As for other details of the proposed patch, I was intending to put > > all the hook calls into grouping_planner for consistency, rather than > > scattering them between grouping_planner and its subroutines. So that > > would probably mean calling the hook for a given step *before* we > > generate the core paths for that step, not after. Did you have a > > reason to want the other order? (If you say "so the hook can look > > at the core-made paths", I'm going to say "that's a bad idea". It'd > > further increase the coupling between a CustomScan extension and core.) > > > No deep reason. I just followed the manner in scan/join path hook; that > calls extension once the core feature adds built-in path nodes. > Ah, I oversight a deep reason. ForeignScan/CustomScan may have an alternative execution path if extension s
Re: [HACKERS] WIP: Upper planner pathification
On 3/22/16 7:28 AM, Michael Paquier wrote: On Mon, Mar 21, 2016 at 7:55 AM, Jim Nasby wrote: On 3/17/16 9:01 AM, Robert Haas wrote: I think that there are an awful lot of cases where extension authors haven't been able to quite do what they want to do without core changes because they couldn't get control in quite the right place; or they could do it but they had to cut-and-paste a lot of code. FWIW, I've certainly run into this at least once, maybe twice. The case I can think of offhand is doing function resolution with variant. I don't remember the details anymore, but my recollection is that to get what I needed I would have needed to copy huge swaths of the rewrite code. Amen, I have been doing that a couple of days ago with some elog stuff. Any ideas on ways to address this? Adding more hooks in random places every time we stumble across something doesn't seem like a good method. One thing I've wondered about is making it easier to find specific constructs in a parsed query so that you can make specific modifications. I recall looking at that once and finding a roadblock (maybe a bunch of functions were static?) -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
On Mon, Mar 21, 2016 at 7:55 AM, Jim Nasby wrote: > On 3/17/16 9:01 AM, Robert Haas wrote: >> >> I think that >> there are an awful lot of cases where extension authors haven't been >> able to quite do what they want to do without core changes because >> they couldn't get control in quite the right place; or they could do >> it but they had to cut-and-paste a lot of code. > > FWIW, I've certainly run into this at least once, maybe twice. The case I > can think of offhand is doing function resolution with variant. I don't > remember the details anymore, but my recollection is that to get what I > needed I would have needed to copy huge swaths of the rewrite code. Amen, I have been doing that a couple of days ago with some elog stuff. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
On 3/17/16 9:01 AM, Robert Haas wrote: I think that there are an awful lot of cases where extension authors haven't been able to quite do what they want to do without core changes because they couldn't get control in quite the right place; or they could do it but they had to cut-and-paste a lot of code. FWIW, I've certainly run into this at least once, maybe twice. The case I can think of offhand is doing function resolution with variant. I don't remember the details anymore, but my recollection is that to get what I needed I would have needed to copy huge swaths of the rewrite code. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Pushdown target list below gather node (WAS Re: [HACKERS] WIP: Upper planner pathification)
On Wed, Mar 16, 2016 at 12:36 PM, Tom Lane wrote: > Robert Haas writes: > > That doesn't update the cost of the subpath, which it probably needs to > > do. I wonder if this shouldn't be implemented by recursing. > > > if (IsA(path, GatherPath) && !has_parallel_hazard((Node *) target->exprs, > > false)) > > apply_projection_to_path(root, something, ((GatherPath *) > > path)->subpath, target); > > > Tom, any comments? I think it would be smart to push this into 9.6. > > I agree with the idea that this problem should be solved, but there's > nothing much to like about the particular details here. If we're going > to push down stuff into the workers, I think adding a Result if needed > to do that is something we'd definitely want it to do. So more > along the lines of > > if (IsA(path, GatherPath) && > !has_parallel_hazard((Node *) target->exprs, false)) > { > GatherPath *gpath = (GatherPath *) path; > > gpath->subpath = > apply_projection_to_path(root, > gpath->subpath->parent, > gpath->subpath, > target); > } I agree. That's a cleaned-up version of the formula I posted. > However, I continue to doubt that apply_projection_to_path really ought to > know about parallel safety. > > And there is a larger problem with this: I'm not sure that it's > appropriate for apply_projection_to_path to assume that the subpath is not > shared with any other purposes. If it is shared, and we update the > subpath's target in-place, we just broke the other path chains. That's true. I don't see an obvious hazard here, because the Gather's child came from the rel's partial_pathlist, and the only way it gets used from there is to stick the Gather on top of it. So it really can't show up anywhere else. I think. But I agree it's a little scary. (To some lesser extent, apply_projection_to_path is always scary like that.) One idea is to skip generate_gather_paths() for the final rel and then make up the difference later: apply the final rel's target list once we've computed it, and then generate a gather path based on that. But I don't see an obvious way of doing that. > Now the existing uses of apply_projection_to_path don't have to worry > about that because they're always messing with paths whose parent rel > isn't going to be used in any other way. Maybe this one doesn't either > because the subpath would always be a partial path that won't have any > other potential application besides being a child of *this specific* > Gather path. But I'd like to see that addressed in a comment, if so. > > If it's not so, maybe the answer is to always interpose a Result node, > in which case just use create_projection_path() in the above fragment. Mmmph. That seems like a 2-bit solution, but I guess it would work. What if we taught create_projection_plan() to elide the Result node in that case? That is, instead of this: if (tlist_same_exprs(tlist, subplan->targetlist)) We could do this: if (is_projection_capable_path(subplan) || tlist_same_exprs(tlist, subplan->targetlist)) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Pushdown target list below gather node (WAS Re: [HACKERS] WIP: Upper planner pathification)
Robert Haas writes: > On Wed, Mar 16, 2016 at 2:02 PM, Tom Lane wrote: >> Looks pretty close. One point is that if we do end up using a Result >> node, then the parent GatherPath does not get charged for the Result >> node's cpu_per_tuple overhead. I'm not sure that that's worth changing >> though. It's probably better to bet that the subpath is projectable and >> so no cost will ensue, than to bet the other way. > I'm almost sure this way is the better bet. Actually, we do know what will happen ... so maybe /* * We always use create_projection_path here, even if the subpath is * projection-capable, so as to avoid modifying the subpath in place. * It seems unlikely at present that there could be any other * references to the subpath anyway, but better safe than sorry. */ + if (!is_projection_capable_path(gpath->subpath)) + gpath->path.total_cost += cpu_tuple_cost * gpath->subpath->rows; gpath->subpath = (Path *) create_projection_path(root, gpath->subpath->parent, gpath->subpath, target); The comment could use adjustment if you adopt that, to reference the fact that we know create_projection_plan will get rid of the Result if not needed. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Pushdown target list below gather node (WAS Re: [HACKERS] WIP: Upper planner pathification)
On Wed, Mar 16, 2016 at 2:25 PM, Tom Lane wrote: > Robert Haas writes: >> On Wed, Mar 16, 2016 at 2:02 PM, Tom Lane wrote: >>> Looks pretty close. One point is that if we do end up using a Result >>> node, then the parent GatherPath does not get charged for the Result >>> node's cpu_per_tuple overhead. I'm not sure that that's worth changing >>> though. It's probably better to bet that the subpath is projectable and >>> so no cost will ensue, than to bet the other way. > >> I'm almost sure this way is the better bet. > > Actually, we do know what will happen ... so maybe > > /* > * We always use create_projection_path here, even if the subpath is > * projection-capable, so as to avoid modifying the subpath in place. > * It seems unlikely at present that there could be any other > * references to the subpath anyway, but better safe than sorry. > */ > + if (!is_projection_capable_path(gpath->subpath)) > + gpath->path.total_cost += cpu_tuple_cost * gpath->subpath->rows; > gpath->subpath = (Path *) > create_projection_path(root, >gpath->subpath->parent, >gpath->subpath, >target); > > The comment could use adjustment if you adopt that, to reference the fact > that we know create_projection_plan will get rid of the Result if not > needed. OK, I've committed something along those lines. Thanks for the advice, and feel free to whack it around if you have an idea for improving it further - though IMHO this is good enough for 9.6. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Pushdown target list below gather node (WAS Re: [HACKERS] WIP: Upper planner pathification)
Robert Haas writes: > On Wed, Mar 16, 2016 at 12:36 PM, Tom Lane wrote: >> And there is a larger problem with this: I'm not sure that it's >> appropriate for apply_projection_to_path to assume that the subpath is not >> shared with any other purposes. If it is shared, and we update the >> subpath's target in-place, we just broke the other path chains. > That's true. I don't see an obvious hazard here, because the Gather's > child came from the rel's partial_pathlist, and the only way it gets > used from there is to stick the Gather on top of it. So it really > can't show up anywhere else. I think. The key question I think is could there ever be more than one Gather sharing the same subpath? > (To some lesser extent, apply_projection_to_path is always > scary like that.) Right, that's why there's also create_projection_path for when you aren't sure. > Mmmph. That seems like a 2-bit solution, but I guess it would work. > What if we taught create_projection_plan() to elide the Result node in > that case? Yeah, I was thinking about the same thing. The comment block above where you're looking would need some adjustment. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
> Robert Haas writes: > > On Wed, Mar 16, 2016 at 2:47 PM, Tom Lane wrote: > >> Robert Haas writes: > >>> On Mon, Mar 14, 2016 at 9:21 PM, Kouhei Kaigai > >>> wrote: > So, even though we don't need to define multiple hook declarations, > I think the hook invocation is needed just after create__paths() > for each. It will need to inform extension the context of hook > invocation, the argument list will take UpperRelationKind. > > >>> That actually seems like a pretty good point. Otherwise you can't > >>> push anything from the upper rels down unless you are prepared to > >>> handle all of it. > > >> I'm not exactly convinced of the use-case for that. What external > >> thing is likely to handle window functions but not aggregation, > >> for example? > > > I'm not going to say that you're entirely wrong, but I think that > > attitude is a bit short-sighted. > > Well, I'm prepared to yield to the extent of repeating the hook call > before each phase with an UpperRelationKind parameter to tell which phase > we're about to do. The main concern here is to avoid redundant > computation, but the hook can check the "kind" parameter and fall out > quickly if it has nothing useful to do at the current phase. > > I do not, however, like the proposal to expose wflists and so forth. > Those are internal data structures in grouping_planner and I absolutely > refuse to promise that they're going to stay stable. (I had already > been thinking a couple of weeks ago about revising the activeWindows > data structure, now that it would be reasonably practical to cost out > different orders for doing the window functions in.) I think a hook > that has its own ideas about window function implementation methods > can gather its own information about the WFs without that much extra > cost, and it very probably wouldn't want exactly the same data that > create_window_paths uses today anyway. > > So what I would now propose is > > typedef void (*create_upper_paths_hook_type) (PlannerInfo *root, > UpperRelationKind stage, > RelOptInfo *input_rel); > Hmm... It's not easy to imagine a case that extension wants own idea to extract window functions from the target list and select active windows, even if extension wants to have own executor and own cost estimation logic. In case when extension tries to add WindowPath + CustomPath(Sort), extension is interested in alternative sort task, but not window function itself. It is natural to follow the built-in implementation, thus, it motivates extension author to take copy & paste the code. select_active_windows() is static, so extension needs to have same routine on their side. On the other hands, 'rollup_lists' and 'rollup_groupclauses' need three static functions (extract_rollup_sets(), reorder_grouping_sets() and preprocess_groupclause() to reproduce the equivalent data structure. It is larger copy & paste burden, if extension is not interested in extracting the information related to grouping set. I understand it is not "best", but better to provide extra information needed for extension to reproduce equivalent pathnode, even if fields of UpperPathExtraData structure is not stable right now. > and have this invoked at each stage right before we call > create_grouping_paths, create_window_paths, etc. > It seems to me reasonable. > Also, I don't particularly see a need for a corresponding API for FDWs. > If an FDW is going to do anything in this space, it presumably has to > build up ForeignPaths for all the steps anyway. So I'd be inclined > to leave GetForeignUpperPaths as-is. > It seems to me reasonable. FDW driver which is interested in remote execution of upper path can use the hook arbitrary. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Pushdown target list below gather node (WAS Re: [HACKERS] WIP: Upper planner pathification)
Robert Haas writes: > OK, I've committed something along those lines. Thanks for the > advice, and feel free to whack it around if you have an idea for > improving it further - though IMHO this is good enough for 9.6. The committed patch looks fine to me. WRT your commit-message comment about pushing down only parallel-safe tlist expressions, I would strongly advise leaving that for another day. I have some irons in the fire concerning ways to push down computation of index expressions (so that we can consistently get f(x) out of an index on f(x) rather than recomputing it), and managing parallel-safe expressions seems like it'd benefit from that infrastucture too. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
On Mon, Mar 14, 2016 at 9:21 PM, Kouhei Kaigai wrote: > So, even though we don't need to define multiple hook declarations, > I think the hook invocation is needed just after create__paths() > for each. It will need to inform extension the context of hook > invocation, the argument list will take UpperRelationKind. That actually seems like a pretty good point. Otherwise you can't push anything from the upper rels down unless you are prepared to handle all of it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Pushdown target list below gather node (WAS Re: [HACKERS] WIP: Upper planner pathification)
On Fri, Mar 18, 2016 at 10:08 AM, Tom Lane wrote: > Robert Haas writes: >> OK, I've committed something along those lines. Thanks for the >> advice, and feel free to whack it around if you have an idea for >> improving it further - though IMHO this is good enough for 9.6. > > The committed patch looks fine to me. WRT your commit-message comment > about pushing down only parallel-safe tlist expressions, I would strongly > advise leaving that for another day. I have some irons in the fire > concerning ways to push down computation of index expressions (so that > we can consistently get f(x) out of an index on f(x) rather than > recomputing it), and managing parallel-safe expressions seems like it'd > benefit from that infrastucture too. Roger. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Pushdown target list below gather node (WAS Re: [HACKERS] WIP: Upper planner pathification)
Amit Kapila writes: > While reading above code changes, it looks like it is assuming that subpath > and subplan will always be same (as it is verifying projection capability > of subpath and attaching the tlist to subplan), but I think it is possible > that subpath and subplan correspond to different nodes when gating Result > node is added on to top of scan plan by create_scan_plan(). A little more thought will show you that that's not actually relevant, because the tlist computation would have happened (or not) below the gating Result. If gating Results had an impact on apply_projection_to_path's decisions we'd have had to do something about that before this. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
On Thu, Mar 17, 2016 at 4:22 PM, Tom Lane wrote: > Robert Haas writes: >> On Thu, Mar 17, 2016 at 2:31 PM, Tom Lane wrote: >>> Also, I don't particularly see a need for a corresponding API for FDWs. >>> If an FDW is going to do anything in this space, it presumably has to >>> build up ForeignPaths for all the steps anyway. So I'd be inclined >>> to leave GetForeignUpperPaths as-is. > >> No idea if that is going to be a significant limitation or not. >> Doesn't seem like it should be, but what do I know? > > Well, to my mind the GetForeignUpperPaths API is meant to handle the > common case efficiently. An FDW that's not happy with that can always > get into the create_upper_paths_hook instead. OK - sold. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Pushdown target list below gather node (WAS Re: [HACKERS] WIP: Upper planner pathification)
Robert Haas writes: > That doesn't update the cost of the subpath, which it probably needs to > do. I wonder if this shouldn't be implemented by recursing. > if (IsA(path, GatherPath) && !has_parallel_hazard((Node *) target->exprs, > false)) > apply_projection_to_path(root, something, ((GatherPath *) > path)->subpath, target); > Tom, any comments? I think it would be smart to push this into 9.6. I agree with the idea that this problem should be solved, but there's nothing much to like about the particular details here. If we're going to push down stuff into the workers, I think adding a Result if needed to do that is something we'd definitely want it to do. So more along the lines of if (IsA(path, GatherPath) && !has_parallel_hazard((Node *) target->exprs, false)) { GatherPath *gpath = (GatherPath *) path; gpath->subpath = apply_projection_to_path(root, gpath->subpath->parent, gpath->subpath, target); } However, I continue to doubt that apply_projection_to_path really ought to know about parallel safety. And there is a larger problem with this: I'm not sure that it's appropriate for apply_projection_to_path to assume that the subpath is not shared with any other purposes. If it is shared, and we update the subpath's target in-place, we just broke the other path chains. Now the existing uses of apply_projection_to_path don't have to worry about that because they're always messing with paths whose parent rel isn't going to be used in any other way. Maybe this one doesn't either because the subpath would always be a partial path that won't have any other potential application besides being a child of *this specific* Gather path. But I'd like to see that addressed in a comment, if so. If it's not so, maybe the answer is to always interpose a Result node, in which case just use create_projection_path() in the above fragment. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Pushdown target list below gather node (WAS Re: [HACKERS] WIP: Upper planner pathification)
On Wed, Mar 16, 2016 at 10:57 PM, Robert Haas wrote: > > On Wed, Mar 16, 2016 at 12:57 PM, Tom Lane wrote: > > Yeah, I was thinking about the same thing. The comment block above > > where you're looking would need some adjustment. > > OK, how about this? > + * children. Alternatively, apply_projection_to_path might have created + * a projection path as the subpath of a Gather node even though the + * subpath was projection-capable. So, if the subpath is capable of + * projection or the desired tlist is the same expression-wise as the + * subplan's, just jam it in there. We'll have charged for a Result that + * doesn't actually appear in the plan, but that's better than having a + * Result we don't need. */ - if (tlist_same_exprs(tlist, subplan->targetlist)) + if (is_projection_capable_path(best_path->subpath) || + tlist_same_exprs(tlist, subplan->targetlist)) While reading above code changes, it looks like it is assuming that subpath and subplan will always be same (as it is verifying projection capability of subpath and attaching the tlist to subplan), but I think it is possible that subpath and subplan correspond to different nodes when gating Result node is added on to top of scan plan by create_scan_plan(). I think it might be better to explain in comments, why it is safe to rely on projection capability of subpath to attach tlist to subplan. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] WIP: Upper planner pathification
Kouhei Kaigai writes: > On Wed, Mar 16, 2016 at 2:47 PM, Tom Lane wrote: >> I do not, however, like the proposal to expose wflists and so forth. >> Those are internal data structures in grouping_planner and I absolutely >> refuse to promise that they're going to stay stable. > Hmm... It's not easy to imagine a case that extension wants own idea > to extract window functions from the target list and select active > windows, even if extension wants to have own executor and own cost > estimation logic. > In case when extension tries to add WindowPath + CustomPath(Sort), > extension is interested in alternative sort task, but not window > function itself. It is natural to follow the built-in implementation, > thus, it motivates extension author to take copy & paste the code. > select_active_windows() is static, so extension needs to have same > routine on their side. Well, to be perfectly blunt about it, I have said from day one that this notion that a CustomScan extension will be able to cause arbitrary planner behavior changes is loony. We are simply not going to drop a hook into every tenth line of the planner for you, nor de-static-ify every internal function, nor (almost equivalently) expose the data structures those functions produce, because it would cripple core planner development to try to keep the implied APIs stable. And I continue to maintain that any actually-generally-useful ideas would be better handled by submitting them as patches to the core planner, rather than trying to implement them in an arms-length extension. In the case at hand, I notice that the WindowFuncLists struct is actually from find_window_functions in clauses.c, so an extension that needed to get hold of that would be unlikely to do any copying and pasting anyhow -- it'd just call find_window_functions again. (That only needs to search the targetlist, so it's not that expensive.) The other lists you mention are all tightly tied to specific, and not terribly well-designed, implementation strategies for grouping sets and window functions. Those are *very* likely to change in the near future; and even if they don't, I'm skeptical that an external implementor of grouping sets or window functions would want to use exactly those same implementation strategies. Maybe the only reason you're there at all is you want to be smarter about the order of doing window functions, for example. So I really don't want to export any of that stuff. As for other details of the proposed patch, I was intending to put all the hook calls into grouping_planner for consistency, rather than scattering them between grouping_planner and its subroutines. So that would probably mean calling the hook for a given step *before* we generate the core paths for that step, not after. Did you have a reason to want the other order? (If you say "so the hook can look at the core-made paths", I'm going to say "that's a bad idea". It'd further increase the coupling between a CustomScan extension and core.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
Robert Haas writes: > On Thu, Mar 17, 2016 at 2:31 PM, Tom Lane wrote: >> Also, I don't particularly see a need for a corresponding API for FDWs. >> If an FDW is going to do anything in this space, it presumably has to >> build up ForeignPaths for all the steps anyway. So I'd be inclined >> to leave GetForeignUpperPaths as-is. > No idea if that is going to be a significant limitation or not. > Doesn't seem like it should be, but what do I know? Well, to my mind the GetForeignUpperPaths API is meant to handle the common case efficiently. An FDW that's not happy with that can always get into the create_upper_paths_hook instead. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
On Wed, Mar 16, 2016 at 2:47 PM, Tom Lane wrote: > Robert Haas writes: >> On Mon, Mar 14, 2016 at 9:21 PM, Kouhei Kaigai wrote: >>> So, even though we don't need to define multiple hook declarations, >>> I think the hook invocation is needed just after create__paths() >>> for each. It will need to inform extension the context of hook >>> invocation, the argument list will take UpperRelationKind. > >> That actually seems like a pretty good point. Otherwise you can't >> push anything from the upper rels down unless you are prepared to >> handle all of it. > > I'm not exactly convinced of the use-case for that. What external > thing is likely to handle window functions but not aggregation, > for example? I'm not going to say that you're entirely wrong, but I think that attitude is a bit short-sighted. If somebody comes up with a new, awesome strategy for evaluating window functions, they're not going to be able to make it work in all cases without cut-and-pasting the logic for generating grouping paths. Now maybe nobody will ever come up with such a strategy, but if they do, this will get in the way. Also, KaiGai offered a different example of how this might get in the way. I'm not prepared to do battle over this right now, but I think that there are an awful lot of cases where extension authors haven't been able to quite do what they want to do without core changes because they couldn't get control in quite the right place; or they could do it but they had to cut-and-paste a lot of code. I don't think it requires all that much imagination to see how that could also happen here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Pushdown target list below gather node (WAS Re: [HACKERS] WIP: Upper planner pathification)
On Wed, Mar 16, 2016 at 2:02 PM, Tom Lane wrote: > Robert Haas writes: >> On Wed, Mar 16, 2016 at 12:57 PM, Tom Lane wrote: >>> Yeah, I was thinking about the same thing. The comment block above >>> where you're looking would need some adjustment. > >> OK, how about this? > > Looks pretty close. One point is that if we do end up using a Result > node, then the parent GatherPath does not get charged for the Result > node's cpu_per_tuple overhead. I'm not sure that that's worth changing > though. It's probably better to bet that the subpath is projectable and > so no cost will ensue, than to bet the other way. I'm almost sure this way is the better bet. I actually think at present the GatherPath is always on top of a scan or join, and those all project. There might be other cases in the future that don't, but I think it'd be fine to leave off worrying about this until we (a) find a case where it happens and (b) failing to charge for the Result causes a problem. The current situation of never projecting in the workers is far worse. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Pushdown target list below gather node (WAS Re: [HACKERS] WIP: Upper planner pathification)
On Wed, Mar 16, 2016 at 3:09 AM, Amit Kapila wrote: > On Wed, Mar 9, 2016 at 11:58 PM, Robert Haas > wrote: > > > > On Wed, Mar 9, 2016 at 12:33 PM, Tom Lane wrote: > > > > > > Gather is a bit weird, because although it can project (and needs to, > > > per the example of needing to compute a non-parallel-safe function), > > > you would rather push down as much work as possible to the child node; > > > and doing so is semantically OK for parallel-safe functions. (Pushing > > > functions down past a Sort node, for a counterexample, is not so OK > > > if you are concerned about function evaluation order, or even number > > > of executions.) > > > > > > In the current code structure it would perhaps be reasonable to teach > > > apply_projection_to_path about that --- although this would require > > > logic to separate parallel-safe and non-parallel-safe subexpressions, > > > which doesn't quite seem like something apply_projection_to_path > > > should be doing. > > > > I think for v1 it would be fine to make this all-or-nothing; that's > > what I had in mind to do. That is, if the entire tlist is > > parallel-safe, push it all down. If not, let the workers just return > > the necessary Vars and have Gather compute the final tlist. > > > > I find it quite convenient to teach apply_projection_to_path() to push > down target-list beneath Gather node, when targetlist contains > parallel-safe expression. Attached patch implements pushing targetlist > beneath gather node. > That doesn't update the cost of the subpath, which it probably needs to do. I wonder if this shouldn't be implemented by recursing. if (IsA(path, GatherPath) && !has_parallel_hazard((Node *) target->exprs, false)) apply_projection_to_path(root, something, ((GatherPath *) path)->subpath, target); Tom, any comments? I think it would be smart to push this into 9.6. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Pushdown target list below gather node (WAS Re: [HACKERS] WIP: Upper planner pathification)
Robert Haas writes: > On Wed, Mar 16, 2016 at 12:57 PM, Tom Lane wrote: >> Yeah, I was thinking about the same thing. The comment block above >> where you're looking would need some adjustment. > OK, how about this? Looks pretty close. One point is that if we do end up using a Result node, then the parent GatherPath does not get charged for the Result node's cpu_per_tuple overhead. I'm not sure that that's worth changing though. It's probably better to bet that the subpath is projectable and so no cost will ensue, than to bet the other way. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Pushdown target list below gather node (WAS Re: [HACKERS] WIP: Upper planner pathification)
On Wed, Mar 16, 2016 at 12:57 PM, Tom Lane wrote: > Yeah, I was thinking about the same thing. The comment block above > where you're looking would need some adjustment. OK, how about this? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company diff --git a/src/backend/optimizer/plan/createplan.c b/src/backend/optimizer/plan/createplan.c index e37bdfd..f08f0ea 100644 --- a/src/backend/optimizer/plan/createplan.c +++ b/src/backend/optimizer/plan/createplan.c @@ -1396,18 +1396,21 @@ create_projection_plan(PlannerInfo *root, ProjectionPath *best_path) tlist = build_path_tlist(root, &best_path->path); /* - * Although the ProjectionPath node wouldn't have been made unless its - * pathtarget is different from the subpath's, it can still happen that - * the constructed tlist matches the subplan's. (An example is that - * MergeAppend doesn't project, so we would have thought that we needed a - * projection to attach resjunk sort columns to its output ... but - * create_merge_append_plan might have added those same resjunk sort - * columns to both MergeAppend and its children.) So, if the desired - * tlist is the same expression-wise as the subplan's, just jam it in - * there. We'll have charged for a Result that doesn't actually appear in - * the plan, but that's better than having a Result we don't need. + * We might not really need a Result node here. There are several ways + * that this can happen. For example, MergeAppend doesn't project, so we + * would have thought that we needed a projection to attach resjunk sort + * columns to its output ... but create_merge_append_plan might have + * added those same resjunk sort columns to both MergeAppend and its + * children. Alternatively, apply_projection_to_path might have created + * a projection path as the subpath of a Gather node even though the + * subpath was projection-capable. So, if the subpath is capable of + * projection or the desired tlist is the same expression-wise as the + * subplan's, just jam it in there. We'll have charged for a Result that + * doesn't actually appear in the plan, but that's better than having a + * Result we don't need. */ - if (tlist_same_exprs(tlist, subplan->targetlist)) + if (is_projection_capable_path(best_path->subpath) || + tlist_same_exprs(tlist, subplan->targetlist)) { plan = subplan; plan->targetlist = tlist; diff --git a/src/backend/optimizer/util/pathnode.c b/src/backend/optimizer/util/pathnode.c index b8ea316..5491f36 100644 --- a/src/backend/optimizer/util/pathnode.c +++ b/src/backend/optimizer/util/pathnode.c @@ -,6 +,30 @@ apply_projection_to_path(PlannerInfo *root, path->total_cost += target->cost.startup - oldcost.startup + (target->cost.per_tuple - oldcost.per_tuple) * path->rows; + /* + * If the path happens to be a Gather path, we'd like to arrange for the + * subpath to return the required target list so that workers can help + * project. But if there is something that is not parallel-safe in the + * target expressions, then we can't. + */ + if (IsA(path, GatherPath) && + !has_parallel_hazard((Node *) target->exprs, false)) + { + GatherPath *gpath = (GatherPath *) path; + + /* + * We always use create_projection_path here, even if the subpath is + * projection-capable, so as to avoid modifying the subpath in place. + * It seems unlikely at present that there could be any other + * references to the subpath anyway, but better safe than sorry. + */ + gpath->subpath = (Path *) + create_projection_path(root, + gpath->subpath->parent, + gpath->subpath, + target); + } + return path; } -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
> > Robert Haas writes: > > > On Mon, Mar 14, 2016 at 9:21 PM, Kouhei Kaigai > > > wrote: > > >> So, even though we don't need to define multiple hook declarations, > > >> I think the hook invocation is needed just after create__paths() > > >> for each. It will need to inform extension the context of hook > > >> invocation, the argument list will take UpperRelationKind. > > > > > That actually seems like a pretty good point. Otherwise you can't > > > push anything from the upper rels down unless you are prepared to > > > handle all of it. > > > > I'm not exactly convinced of the use-case for that. What external > > thing is likely to handle window functions but not aggregation, > > for example? > > > WindowPath usually takes a SortPath. Even though extension don't want to > handle window function itself, it may want to add alternative sort logic > than built-in. > Unless it does not calculate expected cost, nobody knows whether WindowPath + > SortPath is really cheaper than WindowPath + CustomPath("GpuSort"). > > The supplied query may require to run group-by prior to window function, > but extension may not be interested in group-by on the other hands, thus, > extension needs to get control around the location where built-in logic > also adds paths to fetch the cheapest path of the underlying paths. > If I would design the hook, I will put its entrypoint at: - tail of create_grouping_paths(), just before set_cheapest() - tail of create_window_paths(), just before set_cheapest() - tail of create_distinct_paths(), just before set_cheapest() - tail of create_ordered_paths(), just before set_cheapest() - tail of grouping_planner(), after the loop of create_modifytable_path() I'm not 100% certain whether the last one is the straightforward idea to provide alternative writing stuff. For example, if an extension has own special storage like columnar format, we may need more consideration whether CustomPath and related stuff are suitable tool. On the other hands, I believe the earlier 4 entrypoints are right location. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei pgsql-v9.6-upper-custom-path.1.patch Description: pgsql-v9.6-upper-custom-path.1.patch -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
> Robert Haas writes: > > On Mon, Mar 14, 2016 at 9:21 PM, Kouhei Kaigai wrote: > >> So, even though we don't need to define multiple hook declarations, > >> I think the hook invocation is needed just after create__paths() > >> for each. It will need to inform extension the context of hook > >> invocation, the argument list will take UpperRelationKind. > > > That actually seems like a pretty good point. Otherwise you can't > > push anything from the upper rels down unless you are prepared to > > handle all of it. > > I'm not exactly convinced of the use-case for that. What external > thing is likely to handle window functions but not aggregation, > for example? > WindowPath usually takes a SortPath. Even though extension don't want to handle window function itself, it may want to add alternative sort logic than built-in. Unless it does not calculate expected cost, nobody knows whether WindowPath + SortPath is really cheaper than WindowPath + CustomPath("GpuSort"). The supplied query may require to run group-by prior to window function, but extension may not be interested in group-by on the other hands, thus, extension needs to get control around the location where built-in logic also adds paths to fetch the cheapest path of the underlying paths. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
Robert Haas writes: > On Mon, Mar 14, 2016 at 9:21 PM, Kouhei Kaigai wrote: >> So, even though we don't need to define multiple hook declarations, >> I think the hook invocation is needed just after create__paths() >> for each. It will need to inform extension the context of hook >> invocation, the argument list will take UpperRelationKind. > That actually seems like a pretty good point. Otherwise you can't > push anything from the upper rels down unless you are prepared to > handle all of it. I'm not exactly convinced of the use-case for that. What external thing is likely to handle window functions but not aggregation, for example? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
On Thu, Mar 17, 2016 at 2:31 PM, Tom Lane wrote: > Well, I'm prepared to yield to the extent of repeating the hook call > before each phase with an UpperRelationKind parameter to tell which phase > we're about to do. The main concern here is to avoid redundant > computation, but the hook can check the "kind" parameter and fall out > quickly if it has nothing useful to do at the current phase. > > I do not, however, like the proposal to expose wflists and so forth. > Those are internal data structures in grouping_planner and I absolutely > refuse to promise that they're going to stay stable. (I had already > been thinking a couple of weeks ago about revising the activeWindows > data structure, now that it would be reasonably practical to cost out > different orders for doing the window functions in.) I think a hook > that has its own ideas about window function implementation methods > can gather its own information about the WFs without that much extra > cost, and it very probably wouldn't want exactly the same data that > create_window_paths uses today anyway. > > So what I would now propose is > > typedef void (*create_upper_paths_hook_type) (PlannerInfo *root, > UpperRelationKind stage, > RelOptInfo *input_rel); > > and have this invoked at each stage right before we call > create_grouping_paths, create_window_paths, etc. Works for me. > Also, I don't particularly see a need for a corresponding API for FDWs. > If an FDW is going to do anything in this space, it presumably has to > build up ForeignPaths for all the steps anyway. So I'd be inclined > to leave GetForeignUpperPaths as-is. No idea if that is going to be a significant limitation or not. Doesn't seem like it should be, but what do I know? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: Pushdown target list below gather node (WAS Re: [HACKERS] WIP: Upper planner pathification)
On Thu, Mar 17, 2016 at 7:10 PM, Tom Lane wrote: > > Amit Kapila writes: > > While reading above code changes, it looks like it is assuming that subpath > > and subplan will always be same (as it is verifying projection capability > > of subpath and attaching the tlist to subplan), but I think it is possible > > that subpath and subplan correspond to different nodes when gating Result > > node is added on to top of scan plan by create_scan_plan(). > > A little more thought will show you that that's not actually relevant, > because the tlist computation would have happened (or not) below the > gating Result. If gating Results had an impact on > apply_projection_to_path's decisions we'd have had to do something about > that before this. > I understand that gating Results won't impact it, but it was just not apparent from looking at the code I had referred. If you think it is quite obvious thing, then we can leave the comment as it is. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] WIP: Upper planner pathification
Robert Haas writes: > On Wed, Mar 16, 2016 at 2:47 PM, Tom Lane wrote: >> Robert Haas writes: >>> On Mon, Mar 14, 2016 at 9:21 PM, Kouhei Kaigai wrote: So, even though we don't need to define multiple hook declarations, I think the hook invocation is needed just after create__paths() for each. It will need to inform extension the context of hook invocation, the argument list will take UpperRelationKind. >>> That actually seems like a pretty good point. Otherwise you can't >>> push anything from the upper rels down unless you are prepared to >>> handle all of it. >> I'm not exactly convinced of the use-case for that. What external >> thing is likely to handle window functions but not aggregation, >> for example? > I'm not going to say that you're entirely wrong, but I think that > attitude is a bit short-sighted. Well, I'm prepared to yield to the extent of repeating the hook call before each phase with an UpperRelationKind parameter to tell which phase we're about to do. The main concern here is to avoid redundant computation, but the hook can check the "kind" parameter and fall out quickly if it has nothing useful to do at the current phase. I do not, however, like the proposal to expose wflists and so forth. Those are internal data structures in grouping_planner and I absolutely refuse to promise that they're going to stay stable. (I had already been thinking a couple of weeks ago about revising the activeWindows data structure, now that it would be reasonably practical to cost out different orders for doing the window functions in.) I think a hook that has its own ideas about window function implementation methods can gather its own information about the WFs without that much extra cost, and it very probably wouldn't want exactly the same data that create_window_paths uses today anyway. So what I would now propose is typedef void (*create_upper_paths_hook_type) (PlannerInfo *root, UpperRelationKind stage, RelOptInfo *input_rel); and have this invoked at each stage right before we call create_grouping_paths, create_window_paths, etc. Also, I don't particularly see a need for a corresponding API for FDWs. If an FDW is going to do anything in this space, it presumably has to build up ForeignPaths for all the steps anyway. So I'd be inclined to leave GetForeignUpperPaths as-is. Comments? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
> -Original Message- > From: pgsql-hackers-ow...@postgresql.org > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Tom Lane > Sent: Friday, March 18, 2016 11:44 PM > To: Kaigai Kouhei(海外 浩平) > Cc: Robert Haas; Petr Jelinek; David Rowley; pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] WIP: Upper planner pathification > > Kouhei Kaigai writes: > > On Wed, Mar 16, 2016 at 2:47 PM, Tom Lane wrote: > >> I do not, however, like the proposal to expose wflists and so forth. > >> Those are internal data structures in grouping_planner and I absolutely > >> refuse to promise that they're going to stay stable. > > > Hmm... It's not easy to imagine a case that extension wants own idea > > to extract window functions from the target list and select active > > windows, even if extension wants to have own executor and own cost > > estimation logic. > > In case when extension tries to add WindowPath + CustomPath(Sort), > > extension is interested in alternative sort task, but not window > > function itself. It is natural to follow the built-in implementation, > > thus, it motivates extension author to take copy & paste the code. > > select_active_windows() is static, so extension needs to have same > > routine on their side. > > Well, to be perfectly blunt about it, I have said from day one that this > notion that a CustomScan extension will be able to cause arbitrary planner > behavior changes is loony. We are simply not going to drop a hook into > every tenth line of the planner for you, nor de-static-ify every internal > function, nor (almost equivalently) expose the data structures those > functions produce, because it would cripple core planner development to > try to keep the implied APIs stable. And I continue to maintain that any > actually-generally-useful ideas would be better handled by submitting them > as patches to the core planner, rather than trying to implement them in an > arms-length extension. > > In the case at hand, I notice that the WindowFuncLists struct is > actually from find_window_functions in clauses.c, so an extension > that needed to get hold of that would be unlikely to do any copying > and pasting anyhow -- it'd just call find_window_functions again. > (That only needs to search the targetlist, so it's not that expensive.) > The other lists you mention are all tightly tied to specific, and not > terribly well-designed, implementation strategies for grouping sets and > window functions. Those are *very* likely to change in the near future; > and even if they don't, I'm skeptical that an external implementor of > grouping sets or window functions would want to use exactly those same > implementation strategies. Maybe the only reason you're there at all > is you want to be smarter about the order of doing window functions, > for example. > > So I really don't want to export any of that stuff. > Hmm. I could understand we have active development around this area thus miscellaneous internal data structure may not be enough stable to expose the extension. Don't you deny recall the discussion once implementation gets calmed down on the future development cycle? > As for other details of the proposed patch, I was intending to put > all the hook calls into grouping_planner for consistency, rather than > scattering them between grouping_planner and its subroutines. So that > would probably mean calling the hook for a given step *before* we > generate the core paths for that step, not after. Did you have a > reason to want the other order? (If you say "so the hook can look > at the core-made paths", I'm going to say "that's a bad idea". It'd > further increase the coupling between a CustomScan extension and core.) > No deep reason. I just followed the manner in scan/join path hook; that calls extension once the core feature adds built-in path nodes. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Pushdown target list below gather node (WAS Re: [HACKERS] WIP: Upper planner pathification)
On Wed, Mar 9, 2016 at 11:58 PM, Robert Haas wrote: > > On Wed, Mar 9, 2016 at 12:33 PM, Tom Lane wrote: > > > > Gather is a bit weird, because although it can project (and needs to, > > per the example of needing to compute a non-parallel-safe function), > > you would rather push down as much work as possible to the child node; > > and doing so is semantically OK for parallel-safe functions. (Pushing > > functions down past a Sort node, for a counterexample, is not so OK > > if you are concerned about function evaluation order, or even number > > of executions.) > > > > In the current code structure it would perhaps be reasonable to teach > > apply_projection_to_path about that --- although this would require > > logic to separate parallel-safe and non-parallel-safe subexpressions, > > which doesn't quite seem like something apply_projection_to_path > > should be doing. > > I think for v1 it would be fine to make this all-or-nothing; that's > what I had in mind to do. That is, if the entire tlist is > parallel-safe, push it all down. If not, let the workers just return > the necessary Vars and have Gather compute the final tlist. > I find it quite convenient to teach apply_projection_to_path() to push down target-list beneath Gather node, when targetlist contains parallel-safe expression. Attached patch implements pushing targetlist beneath gather node. Below is output of a simple test which shows the effect of implementation. Without Patch - postgres=# explain verbose select c1+2 from t1 where c1<10; QUERY PLAN - Gather (cost=0.00..44420.43 rows=30 width=4) Output: (c1 + 2) Number of Workers: 2 -> Parallel Seq Scan on public.t1 (cost=0.00..44420.35 rows=13 width=4) Output: c1 Filter: (t1.c1 < 10) (6 rows) With Patch - --- postgres=# explain verbose select c1+2 from t1 where c1<10; QUERY PLAN - Gather (cost=0.00..45063.75 rows=30 width=4) Output: ((c1 + 2)) Number of Workers: 1 -> Parallel Seq Scan on public.t1 (cost=0.00..45063.68 rows=18 width=4) Output: (c1 + 2) Filter: (t1.c1 < 10) (6 rows) In the above plans, you can notice that target list expression (c1 + 2) is pushed beneath Gather node after patch. Thoughts? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com parallel-tlist-pushdown-v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
> -Original Message- > From: Tom Lane [mailto:t...@sss.pgh.pa.us] > Sent: Tuesday, March 15, 2016 2:04 AM > To: Petr Jelinek > Cc: Kaigai Kouhei(海外 浩平); David Rowley; Robert Haas; > pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] WIP: Upper planner pathification > > Petr Jelinek writes: > > On 14/03/16 02:43, Kouhei Kaigai wrote: > >> Even though I couldn't check the new planner implementation entirely, > >> it seems to be the points below are good candidate to inject CustomPath > >> (and potentially ForeignScan). > >> > >> - create_grouping_paths > >> - create_window_paths > >> - create_distinct_paths > >> - create_ordered_paths > >> - just below of the create_modifytable_path > >> (may be valuable for foreign-update pushdown) > > > To me that seems too low inside the planning tree, perhaps adding it > > just to the subquery_planner before SS_identify_outer_params would be > > better, that's the place where you see the path for the whole (sub)query > > so you can search and modify what you need from there. > > I don't like either of those too much. The main thing I've noticed over > the past few days is that you can't readily generate custom upper-level > Paths unless you know what PathTarget grouping_planner is expecting each > level to produce. So what I was toying with doing is (1) having > grouping_planner put all those targets into the PlannerInfo, perhaps > in an array indexed by UpperRelationKind; and (2) adding a hook call > immediately after those targets are computed, say right before > the create_grouping_paths() call (approximately planner.c:1738 > in HEAD). It should be sufficient to have one hook there since > you can inject Paths into any of the upper relations at that point; > moreover, that's late enough that you shouldn't have to recompute > anything you figured out during scan/join planning. > Regarding to the (2), I doubt whether the location is reasonable, because pathlist of each upper_rels[] are still empty, aren't it? It will make extension not-easy to construct its own CustomPath that takes underlying built-in pathnodes. For example, an extension implements its own sort logic but not interested in group-by/window function, it shall want to add its CustomPath to UPPERREL_ORDERED, however, it does not know which is the input_rel and no built-in paths are not added yet at the point of create_upper_paths_hook(). On the other hands, here is another problem if we put a hook after all the upper paths done. In this case, built-in create__paths() functions cannot pay attention for CustomPath to be added later when these functions pick up the cheapest path. So, even though we don't need to define multiple hook declarations, I think the hook invocation is needed just after create__paths() for each. It will need to inform extension the context of hook invocation, the argument list will take UpperRelationKind. In addition, extension cannot reference some local variables from the root structure, like: - rollup_lists - rollup_groupclauses - wflists - activeWindows - have_postponed_srfs As we are doing at set_join_pathlist_hook, it is good idea to define UpperPathExtraData structure to pack misc information. So, how about to re-define the hook as follows? typedef void (*create_upper_paths_hook_type) (UpperRelationKind upper_kind, PlannerInfo *root, RelOptInfo *scan_join_rel, UpperPathExtraData *extra); Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
Robert Haas writes: > On Mon, Mar 14, 2016 at 1:37 PM, Tom Lane wrote: >> Yeah. An alternative definition that would support that would be to >> call the upper-path-providing callback for each FDW that's responsible >> for any base relation of the query. But I think that that would often >> lead to a lot of redundant/wasted computation, and it's not clear to >> me that we can support such cases without other changes as well. > Sure, that's fine with me. Are you going to go make these changes now? So I started looking at this, and almost immediately came to two conclusions: 1. We need to add a "PathTarget *" parameter to create_foreignscan_path. The hard-wired assignment that's in there now, pathnode->path.pathtarget = &(rel->reltarget); is sufficient for scan and join paths, but it doesn't work at all for upper relations because we don't put anything in their reltarget fields. We could do so, but it's still problematic because there will be situations where non-topmost Paths associated with an upperrel have other targets than what topmost Paths do. This is true today for set-operation Paths and WindowAgg Paths, and I think possibly other cases as well (such as inserted projection nodes). We could possibly insist on creating a separate upperrel for every distinct target that's used by intermediate levels of Path in these trees, but that seems kinda pointless to me, at least for now. Really the best answer is to let FDWs have control of it. And it's not like we've never whacked this function's API around before. 2. I was being way too cycle-miserly when I decided to make RelOptInfo.reltarget be an embedded struct rather than defining PathTarget as a regular, separate node type. I'm gonna change that before it's too late. One extra palloc per RelOptInfo is not a noticeable price, and there are too many places where this choice is resulting in notational weirdness. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
Robert Haas writes: > On Mon, Mar 14, 2016 at 1:37 PM, Tom Lane wrote: >> Yeah. An alternative definition that would support that would be to >> call the upper-path-providing callback for each FDW that's responsible >> for any base relation of the query. But I think that that would often >> lead to a lot of redundant/wasted computation, and it's not clear to >> me that we can support such cases without other changes as well. > Sure, that's fine with me. Are you going to go make these changes now? Yeah, in a bit. > Eventually, we might just support a configurable flag on FDWs where > FDWs that want to do this sort of thing can request callbacks on every > join and every upper rel in the query. But that can wait. That'd be a possibility, too. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
On Mon, Mar 14, 2016 at 1:37 PM, Tom Lane wrote: > Robert Haas writes: >> On Mon, Mar 14, 2016 at 1:04 PM, Tom Lane wrote: >>> It would be better if we invent an FDW callback that's meant to be >>> invoked at this stage, but only call it for FDW(s) actively involved >>> in the query. I'm not sure exactly what that ought to look like though. >>> Maybe only call the FDW identified as possible owner of the topmost >>> scan/join relation? > >> I think in the short term that's as well as we're going to do, so +1. >> In the long run, I'm interested in making FDWs be able to optimize >> queries like foreigntab JOIN localtab ON foreigntab.x = localtab.x >> (e.g. by copying localtab to the remote side when it's small); but >> that will require revisiting some old decisions, too. > > Yeah. An alternative definition that would support that would be to > call the upper-path-providing callback for each FDW that's responsible > for any base relation of the query. But I think that that would often > lead to a lot of redundant/wasted computation, and it's not clear to > me that we can support such cases without other changes as well. Sure, that's fine with me. Are you going to go make these changes now? Eventually, we might just support a configurable flag on FDWs where FDWs that want to do this sort of thing can request callbacks on every join and every upper rel in the query. But that can wait. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
Robert Haas writes: > On Mon, Mar 14, 2016 at 1:04 PM, Tom Lane wrote: >> It would be better if we invent an FDW callback that's meant to be >> invoked at this stage, but only call it for FDW(s) actively involved >> in the query. I'm not sure exactly what that ought to look like though. >> Maybe only call the FDW identified as possible owner of the topmost >> scan/join relation? > I think in the short term that's as well as we're going to do, so +1. > In the long run, I'm interested in making FDWs be able to optimize > queries like foreigntab JOIN localtab ON foreigntab.x = localtab.x > (e.g. by copying localtab to the remote side when it's small); but > that will require revisiting some old decisions, too. Yeah. An alternative definition that would support that would be to call the upper-path-providing callback for each FDW that's responsible for any base relation of the query. But I think that that would often lead to a lot of redundant/wasted computation, and it's not clear to me that we can support such cases without other changes as well. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
On Mon, Mar 14, 2016 at 1:04 PM, Tom Lane wrote: > Petr Jelinek writes: >> On 14/03/16 02:43, Kouhei Kaigai wrote: >>> Even though I couldn't check the new planner implementation entirely, >>> it seems to be the points below are good candidate to inject CustomPath >>> (and potentially ForeignScan). >>> >>> - create_grouping_paths >>> - create_window_paths >>> - create_distinct_paths >>> - create_ordered_paths >>> - just below of the create_modifytable_path >>> (may be valuable for foreign-update pushdown) > >> To me that seems too low inside the planning tree, perhaps adding it >> just to the subquery_planner before SS_identify_outer_params would be >> better, that's the place where you see the path for the whole (sub)query >> so you can search and modify what you need from there. > > I don't like either of those too much. The main thing I've noticed over > the past few days is that you can't readily generate custom upper-level > Paths unless you know what PathTarget grouping_planner is expecting each > level to produce. So what I was toying with doing is (1) having > grouping_planner put all those targets into the PlannerInfo, perhaps > in an array indexed by UpperRelationKind; and (2) adding a hook call > immediately after those targets are computed, say right before > the create_grouping_paths() call (approximately planner.c:1738 > in HEAD). It should be sufficient to have one hook there since > you can inject Paths into any of the upper relations at that point; > moreover, that's late enough that you shouldn't have to recompute > anything you figured out during scan/join planning. That is a nice set of characteristics for a hook. > Now, a simple hook function is probably about the best we can offer > CustomScan providers, since we don't really know what they're going > to do. But I'm pretty unenthused about telling FDW authors to use > such a hook, because generally speaking they're simply going to waste > cycles finding out that they aren't involved in a given query. > It would be better if we invent an FDW callback that's meant to be > invoked at this stage, but only call it for FDW(s) actively involved > in the query. I'm not sure exactly what that ought to look like though. > Maybe only call the FDW identified as possible owner of the topmost > scan/join relation? I think in the short term that's as well as we're going to do, so +1. In the long run, I'm interested in making FDWs be able to optimize queries like foreigntab JOIN localtab ON foreigntab.x = localtab.x (e.g. by copying localtab to the remote side when it's small); but that will require revisiting some old decisions, too. What you're proposing here sounds consistent with what we've done so far. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
Petr Jelinek writes: > On 14/03/16 02:43, Kouhei Kaigai wrote: >> Even though I couldn't check the new planner implementation entirely, >> it seems to be the points below are good candidate to inject CustomPath >> (and potentially ForeignScan). >> >> - create_grouping_paths >> - create_window_paths >> - create_distinct_paths >> - create_ordered_paths >> - just below of the create_modifytable_path >> (may be valuable for foreign-update pushdown) > To me that seems too low inside the planning tree, perhaps adding it > just to the subquery_planner before SS_identify_outer_params would be > better, that's the place where you see the path for the whole (sub)query > so you can search and modify what you need from there. I don't like either of those too much. The main thing I've noticed over the past few days is that you can't readily generate custom upper-level Paths unless you know what PathTarget grouping_planner is expecting each level to produce. So what I was toying with doing is (1) having grouping_planner put all those targets into the PlannerInfo, perhaps in an array indexed by UpperRelationKind; and (2) adding a hook call immediately after those targets are computed, say right before the create_grouping_paths() call (approximately planner.c:1738 in HEAD). It should be sufficient to have one hook there since you can inject Paths into any of the upper relations at that point; moreover, that's late enough that you shouldn't have to recompute anything you figured out during scan/join planning. Now, a simple hook function is probably about the best we can offer CustomScan providers, since we don't really know what they're going to do. But I'm pretty unenthused about telling FDW authors to use such a hook, because generally speaking they're simply going to waste cycles finding out that they aren't involved in a given query. It would be better if we invent an FDW callback that's meant to be invoked at this stage, but only call it for FDW(s) actively involved in the query. I'm not sure exactly what that ought to look like though. Maybe only call the FDW identified as possible owner of the topmost scan/join relation? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
> -Original Message- > From: Petr Jelinek [mailto:p...@2ndquadrant.com] > Sent: Monday, March 14, 2016 12:18 PM > To: Kaigai Kouhei(海外 浩平); Tom Lane; David Rowley > Cc: Robert Haas; pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] WIP: Upper planner pathification > > On 14/03/16 02:43, Kouhei Kaigai wrote: > > > > CustomPath node is originally expected to generate various kind of plan > > node, not only scan/join, and its interface is designed to support them. > > For example, we can expect a CustomPath that generates "CustomSort". > > > > On the other hands, upper path consideration is more variable than the > > case of scan/join path consideration. Probably, we can have no centralized > > point to add custom-paths for sort, group-by, ... > > So, I think we have hooks for each (supported) upper path workload. > > > > In case of sorting for example, the best location of the hook is just > > above of the Assert() in the create_ordered_paths(). It allows to compare > > estimated cost between SortPath and CustomPath. > > However, it does not allow to inject CustomPath(for sort) into the path > > node that may involve sorting, like WindowPath or AggPath. > > Thus, another hook may be put on create_window_paths and > > create_grouping_paths in my thought. > > > > Some other good idea? > > > > Even though I couldn't check the new planner implementation entirely, > > it seems to be the points below are good candidate to inject CustomPath > > (and potentially ForeignScan). > > > > - create_grouping_paths > > - create_window_paths > > - create_distinct_paths > > - create_ordered_paths > > - just below of the create_modifytable_path > >(may be valuable for foreign-update pushdown) > > > > To me that seems too low inside the planning tree, perhaps adding it > just to the subquery_planner before SS_identify_outer_params would be > better, that's the place where you see the path for the whole (sub)query > so you can search and modify what you need from there. > Thanks for your idea. Yes, I also thought a similar point; where all the path consideration get completed. It indeed allows extensions to walk down the path tree and replace a part of them. However, when we want to inject CustomPath under the built-in paths, extension has to re-calculate cost of the built-in paths again. Perhaps, it affects to the decision of built-in path selection. So, I concluded that it is not realistic to re-implement equivalent upper planning stuff in the extension side, if we put the hook after all the planning works done. If extension can add its CustomPath at create_grouping_paths(), the later steps, like create_window_paths, stands on the estimated cost of the CustomPath. Thus, extension don't need to know the detail of the entire upper planning. Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
On 14/03/16 02:43, Kouhei Kaigai wrote: > > CustomPath node is originally expected to generate various kind of plan > node, not only scan/join, and its interface is designed to support them. > For example, we can expect a CustomPath that generates "CustomSort". > > On the other hands, upper path consideration is more variable than the > case of scan/join path consideration. Probably, we can have no centralized > point to add custom-paths for sort, group-by, ... > So, I think we have hooks for each (supported) upper path workload. > > In case of sorting for example, the best location of the hook is just > above of the Assert() in the create_ordered_paths(). It allows to compare > estimated cost between SortPath and CustomPath. > However, it does not allow to inject CustomPath(for sort) into the path > node that may involve sorting, like WindowPath or AggPath. > Thus, another hook may be put on create_window_paths and > create_grouping_paths in my thought. > > Some other good idea? > > Even though I couldn't check the new planner implementation entirely, > it seems to be the points below are good candidate to inject CustomPath > (and potentially ForeignScan). > > - create_grouping_paths > - create_window_paths > - create_distinct_paths > - create_ordered_paths > - just below of the create_modifytable_path >(may be valuable for foreign-update pushdown) > To me that seems too low inside the planning tree, perhaps adding it just to the subquery_planner before SS_identify_outer_params would be better, that's the place where you see the path for the whole (sub)query so you can search and modify what you need from there. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
Hello, I'm now checking the new planner implementation to find out the way to integrate CustomPath to the upper planner also. CustomPath node is originally expected to generate various kind of plan node, not only scan/join, and its interface is designed to support them. For example, we can expect a CustomPath that generates "CustomSort". On the other hands, upper path consideration is more variable than the case of scan/join path consideration. Probably, we can have no centralized point to add custom-paths for sort, group-by, ... So, I think we have hooks for each (supported) upper path workload. In case of sorting for example, the best location of the hook is just above of the Assert() in the create_ordered_paths(). It allows to compare estimated cost between SortPath and CustomPath. However, it does not allow to inject CustomPath(for sort) into the path node that may involve sorting, like WindowPath or AggPath. Thus, another hook may be put on create_window_paths and create_grouping_paths in my thought. Some other good idea? Even though I couldn't check the new planner implementation entirely, it seems to be the points below are good candidate to inject CustomPath (and potentially ForeignScan). - create_grouping_paths - create_window_paths - create_distinct_paths - create_ordered_paths - just below of the create_modifytable_path (may be valuable for foreign-update pushdown) Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei > -Original Message- > From: pgsql-hackers-ow...@postgresql.org > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Tom Lane > Sent: Saturday, March 05, 2016 3:02 AM > To: David Rowley > Cc: Robert Haas; pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] WIP: Upper planner pathification > > OK, here is a version that I think addresses all of the recent comments: > > * I refactored the grouping-sets stuff as suggested by Robert and David. > The GroupingSetsPath code is now used *only* when there are grouping sets, > otherwise what you get is a plain AGG_SORTED AggPath. This allowed > removal of a boatload of weird corner cases in the GroupingSets code path, > so it was a good change. (Fundamentally, that's cleaning up some > questionable coding in the grouping sets patch rather than fixing anything > directly related to pathification, but I like the code better now.) > > * I refactored the handling of targetlists in createplan.c. After some > reflection I decided that the disuse_physical_tlist callers fell into > three separate categories: those that actually needed the exact requested > tlist to be returned, those that wanted non-bloated tuples because they > were going to put them into sort or hash storage, and those that needed > grouping columns to be properly labeled. The new approach is to pass down > a "flags" word that specifies which if any of these cases apply at a > specific plan level. use_physical_tlist now always makes the right > decision to start with, and disuse_physical_tlist is gone entirely, which > should make things a bit faster since we won't uselessly construct and > discard physical tlists. The missing logic from make_subplanTargetList > and locate_grouping_columns is reincarnated in the physical-tlist code. > > * Added explicit limit/offset fields to LimitPath, as requested by Teodor. > > * Removed SortPath.sortgroupclauses. > > * Fixed handling of parallel-query fields in new path node types. > (BTW, I found what seemed to be a couple of pre-existing bugs of > the same kind, eg create_mergejoin_path was different from the > other two kinds of join as to setting parallel_degree.) > > > What remains to be done, IMV: > > * Performance testing as per yesterday's discussion. > > * Debug support in outfuncs.c and print_path() for new node types. > > * Clean up unfinished work on function header comments. > > * Write some documentation about how FDWs might use this. > > I'll work on the performance testing next. Barring unsatisfactory > results from that, I think this could be committable in a couple > of days. > > regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
Andres Freund writes: > On 2016-03-12 12:22:01 -0500, Tom Lane wrote: >> I wonder whether that's pathification per se. > If you're interested enough, I've uploaded a dump of the schema relevant > table to http://anarazel.de/t/lineitem_95_96_plan.dump.gz I haven't dug into it, but I'll bet this is a case of add_path deciding that the GroupAgg plan is fuzzily the same cost and better sorted (ie, it produces *some* sort order, versus none for the hash), so it kicks the hash plan out. Again, that would not have happened with the old hard-wired cost comparisons in grouping_planner, because they considered no factors other than an exact cost comparison. > I've not yet looked deep enough to determine the root cause; I did > however notice that set enable_sort = false; yields a cheaper plan than > the default one, within the fuzz range (137.91..137.93 vs 138.43..139.02). Yeah, you're just forcing it to choose the hash plan again. But that's within the cost fuzz range, so it's a legitimate choice. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
On 2016-03-12 12:22:01 -0500, Tom Lane wrote: > Andres Freund writes: > > On 2016-03-10 23:38:14 -0500, Tom Lane wrote: > >> I'll do it ... just send me the list. > > > After exporting make_agg, make_limit, make_sort_from_sortclauses and > > making some trivial adjustments due to the pull_var_clause changes > > change, Citus' tests pass on 9.6, bar some noise. > > OK, done. Thanks. > > Pathification made > > some plans switch from hash-agg to sort-agg, and the other way round; > > but that's obviously ok. > > I wonder whether that's pathification per se. If you're interested enough, I've uploaded a dump of the schema relevant table to http://anarazel.de/t/lineitem_95_96_plan.dump.gz the affected query is (after ANALYZE lineitem_102009) EXPLAIN SELECT l_quantity, count(*) AS count, sum(l_extendedprice) AS avg, count(l_extendedprice) AS avg, array_agg(l_orderkey) AS array_agg FROM lineitem_102009 lineitem WHERE ((l_quantity < '5'::numeric) AND (l_orderkey > 5500) AND (l_orderkey < 9500)) GROUP BY l_quantity; =# SELECT version(); ┌──┐ │ version │ ├──┤ │ PostgreSQL 9.5.1 on x86_64-pc-linux-gnu, compiled by gcc-5.real (Debian 5.3.1-10) 5.3.1 20160224, 64-bit │ └──┘ (1 row) =# SELECT l_quantity, count(*) AS count, sum(l_extendedprice) AS avg, count(l_extendedprice) AS avg, array_agg(l_orderkey) AS array_agg FROM lineitem_102009 lineitem WHERE ((l_quantity < '5'::numeric) AND (l_orderkey > 5500) AND (l_orderkey < 9500)) GROUP BY l_quantity; ┌┬───┬──┬─┬┐ │ l_quantity │ count │ avg│ avg │ array_agg │ ├┼───┼──┼─┼┤ │ 1.00 │ 9 │ 13044.06 │ 9 │ {8997,9026,9158,9184,9220,9222,9348,9383,9476} │ │ 4.00 │ 7 │ 40868.84 │ 7 │ {9091,9120,9281,9347,9382,9440,9473} │ │ 2.00 │ 8 │ 26072.02 │ 8 │ {9030,9058,9123,9124,9188,9344,9441,9476} │ │ 3.00 │ 9 │ 39925.32 │ 9 │ {9124,9157,9184,9223,9254,9349,9414,9475,9477} │ └┴───┴──┴─┴┘ (4 rows) Time: 0.906 ms =# EXPLAIN SELECT l_quantity, count(*) AS count, sum(l_extendedprice) AS avg, count(l_extendedprice) AS avg, array_agg(l_orderkey) AS array_agg FROM lineitem_102009 lineitem WHERE ((l_quantity < '5'::numeric) AND (l_orderkey > 5500) AND (l_orderkey < 9500)) GROUP BY l_quantity; ┌┐ │ QUERY PLAN │ ├┤ │ HashAggregate (cost=137.91..137.93 rows=1 width=21) │ │ Group Key: l_quantity │ │ -> Bitmap Heap Scan on lineitem_102009 lineitem (cost=13.07..137.44 rows=38 width=21) │ │ Recheck Cond: ((l_orderkey > 5500) AND (l_orderkey < 9500)) │ │ Filter: (l_quantity < '5'::numeric) │ │ -> Bitmap Index Scan on lineitem_pkey_102009 (cost=0.00..13.06 rows=478 width=0) │ │ Index Cond: ((l_orderkey > 5500) AND (l_orderkey < 9500)) │ └ vs. =# SELECT version(); ┌─── │ version ├─── │ PostgreSQL 9.6devel on x86_64-pc-linux-gnu, compiled by gcc-6.real (Debian 6-20160228-1) 6.0.0 20160228 (experimental) [trunk revision └─── (1 row) =# SELECT l_quantity, count(*) AS count, sum(l_extendedprice) AS avg, count(l_extendedprice) AS avg, array_agg(l_orderkey) AS array_agg FROM lineitem_102009 lineitem WHERE ((l_quantity < '5'::numeric) AND (l_orderkey > 5500) AND (l_orderkey < 9500)) GROUP BY l_quantity; ┌┬───┬──┬─┬─
Re: [HACKERS] WIP: Upper planner pathification
Andres Freund writes: > On 2016-03-10 23:38:14 -0500, Tom Lane wrote: >> I'll do it ... just send me the list. > After exporting make_agg, make_limit, make_sort_from_sortclauses and > making some trivial adjustments due to the pull_var_clause changes > change, Citus' tests pass on 9.6, bar some noise. OK, done. > Pathification made > some plans switch from hash-agg to sort-agg, and the other way round; > but that's obviously ok. I wonder whether that's pathification per se. Of the three core regression test EXPLAINs that changed in the pathification commit, two actually were a case of finding better plans. The other one was a random-looking swap between two plans with near-identical costs. When I looked into it, I found that the reason the planner liked the new plan better was that it was parallel-safe; add_path() saw the costs as fuzzily equal and allowed parallel-safe to be the determining factor in the choice. The old code hadn't done that because the hard-wired cost comparisons in grouping_planner() never took parallel-safety into account. But I'd call that a parallelism change, not a pathification change; it would certainly have appeared to be that if the patches had gone in in the opposite order. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
On 2016-03-10 23:38:14 -0500, Tom Lane wrote: > > Would you rather add back the exports or should I? > > I'll do it ... just send me the list. After exporting make_agg, make_limit, make_sort_from_sortclauses and making some trivial adjustments due to the pull_var_clause changes change, Citus' tests pass on 9.6, bar some noise. Pathification made some plans switch from hash-agg to sort-agg, and the other way round; but that's obviously ok. Thanks, Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
Andres Freund writes: > On 2016-03-10 15:03:41 -0500, Tom Lane wrote: >> What it encourages is having module boundaries that actually mean >> something, as well as code that can be refactored without having >> to worry about which extensions will complain about it. > I personally think it's entirely fine to break extensions if it's adding > or removing a few parameters or somesuch. That's easy enough fixed. I don't want to promise that the *behavior* of those functions remains stable. As an example, none of them any longer do any internal cost calculations, which is a change that doesn't directly show in their argument lists but will break extensions just as surely (and more silently) as an argument-list change would do. And no, that change is NOT going to get undone. > Would you rather add back the exports or should I? I'll do it ... just send me the list. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
Hi, On 2016-03-10 15:03:41 -0500, Tom Lane wrote: > Andres Freund writes: > > Primarily because create_plan(), and/or its children, have to know about > > what you're doing; you can hide some, but not all, things below > > CustomScan nodes. > > And which of those things does e.g. setrefs.c not need to know about? For CustomScans? You can mostly get by with ->custom_exprs. > > ISTM, that there's good enough reasons to go either way; I don't see > > what we're gaining by making these private. That just encourages > > copy-paste coding. > > What it encourages is having module boundaries that actually mean > something, as well as code that can be refactored without having > to worry about which extensions will complain about it. I personally think it's entirely fine to break extensions if it's adding or removing a few parameters or somesuch. That's easy enough fixed. > I will yield on this point because it's not worth my time to argue about > it, but I continue to say that it's a bad decision you will regret. FWIW, I do agree that it'd be much nicer to use the new API; the biggest user in Citus should be able to work with that. But it's not that easy to do that and still support postgres 9.4/9.5. > Which functions did you need exactly? I'm not exporting more than > I have to. I'll try to do the port tomorrow; to make sure I have the definitive list. Afaics it's just make_seqscan, make_sort_from_sortclauses, make_limit, make_agg. I'd not however be surprised if other extensions also use some of these. Would you rather add back the exports or should I? Greetings, Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
Andres Freund writes: > On 2016-03-10 14:16:03 -0500, Tom Lane wrote: >> I don't deny that you *could* continue to do things that way, but >> I dispute that it's a good idea. Why can't you generate a Path tree >> and then ask create_plan() to convert it? > Primarily because create_plan(), and/or its children, have to know about > what you're doing; you can hide some, but not all, things below > CustomScan nodes. And which of those things does e.g. setrefs.c not need to know about? > ISTM, that there's good enough reasons to go either way; I don't see > what we're gaining by making these private. That just encourages > copy-paste coding. What it encourages is having module boundaries that actually mean something, as well as code that can be refactored without having to worry about which extensions will complain about it. I will yield on this point because it's not worth my time to argue about it, but I continue to say that it's a bad decision you will regret. Which functions did you need exactly? I'm not exporting more than I have to. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
On Thu, Mar 10, 2016 at 8:45 PM, Robert Haas wrote: > On Thu, Mar 10, 2016 at 2:41 PM, Andres Freund wrote: > > ISTM, that there's good enough reasons to go either way; I don't see > > what we're gaining by making these private. That just encourages > > copy-paste coding. > > +1. Frustrating Citus's attempt to open-source their stuff is not in > the project's interest. > > Agreed. And it's not like we're very restrictive with this in a lot of other parts of the code. While we shouldn't go out of our way for forks/extensions, this seems like a very reasonable tradeoff. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
Re: [HACKERS] WIP: Upper planner pathification
On Thu, Mar 10, 2016 at 11:45 AM, Robert Haas wrote: > On Thu, Mar 10, 2016 at 2:41 PM, Andres Freund wrote: >> ISTM, that there's good enough reasons to go either way; I don't see >> what we're gaining by making these private. That just encourages >> copy-paste coding. > > +1. Frustrating Citus's attempt to open-source their stuff is not in > the project's interest. I agree. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
On Thu, Mar 10, 2016 at 2:41 PM, Andres Freund wrote: > ISTM, that there's good enough reasons to go either way; I don't see > what we're gaining by making these private. That just encourages > copy-paste coding. +1. Frustrating Citus's attempt to open-source their stuff is not in the project's interest. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
On 2016-03-10 14:16:03 -0500, Tom Lane wrote: > Andres Freund writes: > > In Citus' case a full PlannedStmt is generated on the master node, to > > combine the data generated on worker nodes (where the bog standard > > postgres planner is used). It's not the only way to do things, but I > > don't see why the approach would be entirely invalidated by the > > pathification work. > > I don't deny that you *could* continue to do things that way, but > I dispute that it's a good idea. Why can't you generate a Path tree > and then ask create_plan() to convert it? Primarily because create_plan(), and/or its children, have to know about what you're doing; you can hide some, but not all, things below CustomScan nodes. Secondarily, as an extension you will often have to support several major versions. ISTM, that there's good enough reasons to go either way; I don't see what we're gaining by making these private. That just encourages copy-paste coding. Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
Andres Freund writes: > On 2016-03-10 13:48:31 -0500, Tom Lane wrote: >> That was intentional: in my opinion, nothing outside createplan.c ought >> to be making Plan nodes anymore. The expectation is that you make a >> Path describing what you want. Can you explain why, in the new planner >> structure, it would be sane to have external callers of these functions? > In Citus' case a full PlannedStmt is generated on the master node, to > combine the data generated on worker nodes (where the bog standard > postgres planner is used). It's not the only way to do things, but I > don't see why the approach would be entirely invalidated by the > pathification work. I don't deny that you *could* continue to do things that way, but I dispute that it's a good idea. Why can't you generate a Path tree and then ask create_plan() to convert it? Otherwise you're buying into knowing a whole lot about the internals of createplan.c, and having to track changes therein. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
On 2016-03-10 13:48:31 -0500, Tom Lane wrote: > Andres Freund writes: > > I see that you made a lot of formerly externally visible make_* routines > > static. The Citus extension (which will be open source in a few days) > > uses several of these (make_agg, make_sort_from_sortclauses, make_limit > > at least). > > > Can we please re-consider making these static? > > That was intentional: in my opinion, nothing outside createplan.c ought > to be making Plan nodes anymore. The expectation is that you make a > Path describing what you want. Can you explain why, in the new planner > structure, it would be sane to have external callers of these functions? In Citus' case a full PlannedStmt is generated on the master node, to combine the data generated on worker nodes (where the bog standard postgres planner is used). It's not the only way to do things, but I don't see why the approach would be entirely invalidated by the pathification work. We do provide the planner_hook, so restricting the toolbox for that to do something useful, doesn't seem like a necessarily good idea. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
Andres Freund writes: > I see that you made a lot of formerly externally visible make_* routines > static. The Citus extension (which will be open source in a few days) > uses several of these (make_agg, make_sort_from_sortclauses, make_limit > at least). > Can we please re-consider making these static? That was intentional: in my opinion, nothing outside createplan.c ought to be making Plan nodes anymore. The expectation is that you make a Path describing what you want. Can you explain why, in the new planner structure, it would be sane to have external callers of these functions? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
Hi Tom, On 2016-02-28 15:03:28 -0500, Tom Lane wrote: > diff --git a/src/include/optimizer/planmain.h > b/src/include/optimizer/planmain.h > index eaa642b..cd7338a 100644 > *** a/src/include/optimizer/planmain.h > --- b/src/include/optimizer/planmain.h > *** extern RelOptInfo *query_planner(Planner > *** 43,102 >* prototypes for plan/planagg.c >*/ > extern void preprocess_minmax_aggregates(PlannerInfo *root, List *tlist); > - extern Plan *optimize_minmax_aggregates(PlannerInfo *root, List *tlist, > -const AggClauseCosts > *aggcosts, Path *best_path); > > /* >* prototypes for plan/createplan.c >*/ > extern Plan *create_plan(PlannerInfo *root, Path *best_path); > - extern SubqueryScan *make_subqueryscan(List *qptlist, List *qpqual, > - Index scanrelid, Plan *subplan); > extern ForeignScan *make_foreignscan(List *qptlist, List *qpqual, >Index scanrelid, List *fdw_exprs, List > *fdw_private, >List *fdw_scan_tlist, List *fdw_recheck_quals, >Plan *outer_plan); > - extern Append *make_append(List *appendplans, List *tlist); > - extern RecursiveUnion *make_recursive_union(List *tlist, > - Plan *lefttree, Plan *righttree, int > wtParam, > - List *distinctList, long numGroups); > - extern Sort *make_sort_from_pathkeys(PlannerInfo *root, Plan *lefttree, > - List *pathkeys, double > limit_tuples); > - extern Sort *make_sort_from_sortclauses(PlannerInfo *root, List *sortcls, > -Plan *lefttree); > - extern Sort *make_sort_from_groupcols(PlannerInfo *root, List *groupcls, > - AttrNumber *grpColIdx, Plan > *lefttree); > - extern Agg *make_agg(PlannerInfo *root, List *tlist, List *qual, > - AggStrategy aggstrategy, const AggClauseCosts *aggcosts, > - int numGroupCols, AttrNumber *grpColIdx, Oid *grpOperators, > - List *groupingSets, long numGroups, bool combineStates, > - bool finalizeAggs, Plan *lefttree); > - extern WindowAgg *make_windowagg(PlannerInfo *root, List *tlist, > -List *windowFuncs, Index winref, > -int partNumCols, AttrNumber *partColIdx, Oid > *partOperators, > -int ordNumCols, AttrNumber *ordColIdx, Oid > *ordOperators, > -int frameOptions, Node *startOffset, Node *endOffset, > -Plan *lefttree); > - extern Group *make_group(PlannerInfo *root, List *tlist, List *qual, > -int numGroupCols, AttrNumber *grpColIdx, Oid *grpOperators, > -double numGroups, > -Plan *lefttree); > extern Plan *materialize_finished_plan(Plan *subplan); > ! extern Unique *make_unique(Plan *lefttree, List *distinctList); > ! extern LockRows *make_lockrows(Plan *lefttree, List *rowMarks, int > epqParam); > ! extern Limit *make_limit(Plan *lefttree, Node *limitOffset, Node > *limitCount, > !int64 offset_est, int64 count_est); > ! extern SetOp *make_setop(SetOpCmd cmd, SetOpStrategy strategy, Plan > *lefttree, > !List *distinctList, AttrNumber flagColIdx, int firstFlag, > !long numGroups, double outputRows); > ! extern Result *make_result(PlannerInfo *root, List *tlist, > ! Node *resconstantqual, Plan *subplan); > ! extern ModifyTable *make_modifytable(PlannerInfo *root, > ! CmdType operation, bool canSetTag, > ! Index nominalRelation, > ! List *resultRelations, List *subplans, > ! List *withCheckOptionLists, List > *returningLists, > ! List *rowMarks, OnConflictExpr *onconflict, > int epqParam); > extern bool is_projection_capable_plan(Plan *plan); I see that you made a lot of formerly externally visible make_* routines static. The Citus extension (which will be open source in a few days) uses several of these (make_agg, make_sort_from_sortclauses, make_limit at least). Can we please re-consider making these static? It's fine if their parameter list changes from release to release, that's easy enough to adjust to, and it's easily visible. But having to copy all of these into extension code is more than a bit painful; especially make_sort_from_sortclauses. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
On Wed, Mar 9, 2016 at 5:47 PM, Tom Lane wrote: > Alexander Korotkov writes: > > I have a question about Sort path. AFAICS this question wasn't mentioned > in > > the upthread discussion. > > We're producing Sort plans in two ways: from explicit Sort paths, and > from > > other paths which implicitly assumes sorting (like MergeJoin path). > > Would it be better to produce Sort plan only from explicit Sort path? > Thus, > > MergeJoin path would add explicit children Sort paths. That would be more > > unified way. > > Meh. I think the only real effect that would have is to make it slower to > build MergeJoin paths (and in a typical join problem, we build a lot of > those). The entire point of the Path/Plan distinction is to postpone > until createplan.c any work that's not necessary to arrive at a cost > estimate. So the way MergeJoinPath works now seems fine to me. > > > I ask about this from point of view of my "Partial Sort" patch. The > absence > > of implicit sorts would help to make this patch more simple and clean. > > There are other implicit sorts besides those. Admittedly, the efficiency > argument for not making the sort explicit in UniquePath or MergeAppendPath > is a lot weaker than it is for MergeJoin, just because those are less > common path types. But getting rid of the behavior entirely would be > a lot of work, and I'm not convinced it'd be an improvement. > Thank you for the explanation. I'll try to rebase "Partial Sort" leaving this situation as is. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] WIP: Upper planner pathification
On Wed, Mar 9, 2016 at 12:33 PM, Tom Lane wrote: > Alvaro Herrera writes: >> Tom Lane wrote: >>> Yeah, fixed. I had assumed that the existing coding in create_gather_plan >>> was OK, because it looked like it was right for a non-projecting node. >>> But actually Gather can project (why though?), so it's not right. > >> This looks related: >> https://www.postgresql.org/message-id/CA%2BTgmoai9Ruhwk61110rY4cNAzoO6npsFEOaEKjM7Zz8i3evHg%40mail.gmail.com > > Ah, thanks for remembering that thread; I'd forgotten it. > > Gather is a bit weird, because although it can project (and needs to, > per the example of needing to compute a non-parallel-safe function), > you would rather push down as much work as possible to the child node; > and doing so is semantically OK for parallel-safe functions. (Pushing > functions down past a Sort node, for a counterexample, is not so OK > if you are concerned about function evaluation order, or even number > of executions.) > > In the current code structure it would perhaps be reasonable to teach > apply_projection_to_path about that --- although this would require > logic to separate parallel-safe and non-parallel-safe subexpressions, > which doesn't quite seem like something apply_projection_to_path > should be doing. I think for v1 it would be fine to make this all-or-nothing; that's what I had in mind to do. That is, if the entire tlist is parallel-safe, push it all down. If not, let the workers just return the necessary Vars and have Gather compute the final tlist. Now, obviously one could do better. If the tlist contains several expensive functions that are parallel-safe and one inexpensive function that isn't, you'd obviously prefer to compute the parallel-safe stuff in the workers and just compute the one inexpensive thing in the leader. But that's significantly more complicated and I'm not sure it's worth spending a lot of time getting it right just now. Not that I'm complaining if you feel the urge. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
Alvaro Herrera writes: > Tom Lane wrote: >> Yeah, fixed. I had assumed that the existing coding in create_gather_plan >> was OK, because it looked like it was right for a non-projecting node. >> But actually Gather can project (why though?), so it's not right. > This looks related: > https://www.postgresql.org/message-id/CA%2BTgmoai9Ruhwk61110rY4cNAzoO6npsFEOaEKjM7Zz8i3evHg%40mail.gmail.com Ah, thanks for remembering that thread; I'd forgotten it. Gather is a bit weird, because although it can project (and needs to, per the example of needing to compute a non-parallel-safe function), you would rather push down as much work as possible to the child node; and doing so is semantically OK for parallel-safe functions. (Pushing functions down past a Sort node, for a counterexample, is not so OK if you are concerned about function evaluation order, or even number of executions.) In the current code structure it would perhaps be reasonable to teach apply_projection_to_path about that --- although this would require logic to separate parallel-safe and non-parallel-safe subexpressions, which doesn't quite seem like something apply_projection_to_path should be doing. This seems rather closely related to the discussion around Konstantin Knizhnik's patch to delay function evaluations to after the ORDER BY sort when possible/useful. What I think that patch should be doing is for grouping_planner (or subroutines thereof) to classify tlist items as volatile or expensive or not and generate pre-sort and post-sort targetlists appropriately. That's okay for a top-level feature like ORDER BY, but it doesn't work for Gather which can appear much further down in the plan. So maybe apply_projection_to_path is the only feasible place. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
On Wed, Mar 9, 2016 at 12:07 PM, Alvaro Herrera wrote: > Tom Lane wrote: >> Amit Kapila writes: > >> > Without setting max_parallel_degree, it works fine and generate the >> > appropriate results. Here the issue seems to be that the code in >> > grouping_planner doesn't apply the required PathTarget to Path below Gather >> > Path due to which when we generate target list for scan plan, >> >> Yeah, fixed. I had assumed that the existing coding in create_gather_plan >> was OK, because it looked like it was right for a non-projecting node. >> But actually Gather can project (why though?), so it's not right. > > This looks related: > https://www.postgresql.org/message-id/CA%2BTgmoai9Ruhwk61110rY4cNAzoO6npsFEOaEKjM7Zz8i3evHg%40mail.gmail.com I actually wrote a patch for this: http://www.postgresql.org/message-id/CA+TgmoaqWH8W35c7pssuufxOO=axneqen4da_bkeqaa3se3...@mail.gmail.com I assume it no longer applies :-( but I think it would be good to get this into 9.6. It's a pretty simple optimization with a lot to recommend it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
Tom Lane wrote: > Amit Kapila writes: > > Without setting max_parallel_degree, it works fine and generate the > > appropriate results. Here the issue seems to be that the code in > > grouping_planner doesn't apply the required PathTarget to Path below Gather > > Path due to which when we generate target list for scan plan, > > Yeah, fixed. I had assumed that the existing coding in create_gather_plan > was OK, because it looked like it was right for a non-projecting node. > But actually Gather can project (why though?), so it's not right. This looks related: https://www.postgresql.org/message-id/CA%2BTgmoai9Ruhwk61110rY4cNAzoO6npsFEOaEKjM7Zz8i3evHg%40mail.gmail.com -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
Amit Kapila writes: > On latest commit-51c0f63e, I am seeing some issues w.r.t parallel query. > Consider a below case: > create table t1(c1 int, c2 char(1000)); > insert into t1 values(generate_series(1,30),''); > analyze t1; > set max_parallel_degree=2; > postgres=# explain select c1, count(c1) from t1 where c1 < 1000 group by c1; > ERROR: ORDER/GROUP BY expression not found in targetlist Hm. That does not speak well for the coverage of the "run the regression tests with force_parallel_mode enabled" testing approach. > Without setting max_parallel_degree, it works fine and generate the > appropriate results. Here the issue seems to be that the code in > grouping_planner doesn't apply the required PathTarget to Path below Gather > Path due to which when we generate target list for scan plan, Yeah, fixed. I had assumed that the existing coding in create_gather_plan was OK, because it looked like it was right for a non-projecting node. But actually Gather can project (why though?), so it's not right. > Approach-2 > -- > Always generate a tlist in Gather plan as we do for many other cases. I > think this approach will also resolve the issue but haven't tried yet. I think this is the correct fix. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
Alexander Korotkov writes: > I have a question about Sort path. AFAICS this question wasn't mentioned in > the upthread discussion. > We're producing Sort plans in two ways: from explicit Sort paths, and from > other paths which implicitly assumes sorting (like MergeJoin path). > Would it be better to produce Sort plan only from explicit Sort path? Thus, > MergeJoin path would add explicit children Sort paths. That would be more > unified way. Meh. I think the only real effect that would have is to make it slower to build MergeJoin paths (and in a typical join problem, we build a lot of those). The entire point of the Path/Plan distinction is to postpone until createplan.c any work that's not necessary to arrive at a cost estimate. So the way MergeJoinPath works now seems fine to me. > I ask about this from point of view of my "Partial Sort" patch. The absence > of implicit sorts would help to make this patch more simple and clean. There are other implicit sorts besides those. Admittedly, the efficiency argument for not making the sort explicit in UniquePath or MergeAppendPath is a lot weaker than it is for MergeJoin, just because those are less common path types. But getting rid of the behavior entirely would be a lot of work, and I'm not convinced it'd be an improvement. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
On Tue, Mar 8, 2016 at 2:31 AM, Tom Lane wrote: > > I wrote: > >> Attached is a version that addresses today's concerns, and also finishes > >> filling in the loose ends I'd left before, such as documentation and > >> outfuncs.c support. I think this is in a committable state now, though > >> I plan to read through the whole thing again. > > The extra read-through located some minor bugs, mainly places where I'd > forgotten to ensure that Path cost info was transposed into the generated > Plan. That would only have the cosmetic effect that EXPLAIN would print > zeroes for estimated costs, and since we only use EXPLAIN COSTS OFF in > the regression tests, no test failures ensued :-(. > > I've pushed it now; we'll soon see if the buildfarm finds any problems. > On latest commit-51c0f63e, I am seeing some issues w.r.t parallel query. Consider a below case: create table t1(c1 int, c2 char(1000)); insert into t1 values(generate_series(1,30),''); analyze t1; set max_parallel_degree=2; postgres=# explain select c1, count(c1) from t1 where c1 < 1000 group by c1; ERROR: ORDER/GROUP BY expression not found in targetlist Without setting max_parallel_degree, it works fine and generate the appropriate results. Here the issue seems to be that the code in grouping_planner doesn't apply the required PathTarget to Path below Gather Path due to which when we generate target list for scan plan, it misses the required information which in this case is sortgrouprefs and the same target list is then propagated for upper nodes which eventually leads to the above mentioned failure. Due to same reason, if the target list contains some expression, it will give wrong results when parallel query is used. I could see below ways to solve this issue. Approach-1 - First way to solve this issue is to jam the PathTarget for partial paths similar to what we are doing for Paths and I have verified that resolves the issue, the patch for same is attached with this mail. However, it won't work as-is, because this will make target lists pushed to workers as we have applied them below Gather Path which we don't want if the target list has any parallel unsafe functions. To make this approach work, we need to ensure that we jam the PathTarget for partial paths (or generate a projection) only if they contain parallel-safe expressions. Now while creating Gather Plan (create_gather_plan()), we need to ensure in some way that if path below doesn't generate the required tlist, then it should generate it's own tlist rather than using it from subplan. Approach-2 -- Always generate a tlist in Gather plan as we do for many other cases. I think this approach will also resolve the issue but haven't tried yet. Approach-1 has a benefit that it can allow to push target lists to workers which will result in speed-up of parallel queries especially for cases when target list contain costly expressions, so I am slightly inclined to follow that even though that looks more work. Thoughts? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com apply_tlist_partial_path_v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
Hi, Tom! I have a question about Sort path. AFAICS this question wasn't mentioned in the upthread discussion. We're producing Sort plans in two ways: from explicit Sort paths, and from other paths which implicitly assumes sorting (like MergeJoin path). Would it be better to produce Sort plan only from explicit Sort path? Thus, MergeJoin path would add explicit children Sort paths. That would be more unified way. I ask about this from point of view of my "Partial Sort" patch. The absence of implicit sorts would help to make this patch more simple and clean. -- Alexander Korotkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: [HACKERS] WIP: Upper planner pathification
I wrote: > There might be some other things we could do to provide a fast-path for > particularly trivial cases. I wanted to look into that before the code or tests had drifted far enough to make comparisons dubious. Attached is a simple patch that lets grouping_planner fall out with a minimum amount of work if the query is just "SELECT expression(s)", and a couple of scatter plots of regression test query timing. The first plot compares pre-pathification timing with HEAD+this patch, and the second compares HEAD with HEAD+this patch. I had hoped to see more of a benefit, actually, but it seems like this might be enough of a win to be worth committing. Probably the main argument against it is that we'd have to remember to maintain the list of things-to-check-before-using-the-fast-path. Thoughts? regards, tom lane diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c index 5fc8e5b..151f27f 100644 *** a/src/backend/optimizer/plan/planner.c --- b/src/backend/optimizer/plan/planner.c *** grouping_planner(PlannerInfo *root, bool *** 1458,1463 --- 1458,1504 parse->sortClause, tlist); } + else if (parse->rtable == NIL && + parse->commandType == CMD_SELECT && + parse->jointree->quals == NULL && + !parse->hasAggs && !parse->hasWindowFuncs && + parse->groupClause == NIL && parse->groupingSets == NIL && + !root->hasHavingQual && + parse->distinctClause == NIL && + parse->sortClause == NIL && + parse->limitOffset == NULL && parse->limitCount == NULL) + { + /* + * Trivial "SELECT expression(s)" query. This case would be handled + * correctly by the code below, but it comes up often enough to be + * worth having a simplified fast-path for. Need only create a Result + * path with the desired targetlist and shove it into the final rel. + */ + Path *path; + double tlist_rows; + + /* Need not bother with preprocess_targetlist in a SELECT */ + root->processed_tlist = tlist; + + final_rel = fetch_upper_rel(root, UPPERREL_FINAL, NULL); + + path = (Path *) create_result_path(final_rel, + create_pathtarget(root, tlist), + NIL); + + /* We do take the trouble to fix the rows estimate for SRFs, though */ + tlist_rows = tlist_returns_set_rows(tlist); + if (tlist_rows > 1) + { + path->total_cost += path->rows * (tlist_rows - 1) * + cpu_tuple_cost / 2; + path->rows *= tlist_rows; + } + + add_path(final_rel, path); + + return; + } else { /* No set operations, do regular planning */ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
On 8 March 2016 at 10:01, Tom Lane wrote: > I've pushed it now; we'll soon see if the buildfarm finds any problems. Fantastic! I'm looking forward to all the future optimisation opportunities that this opens up. Thanks for making this happen. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
I wrote: >> Attached is a version that addresses today's concerns, and also finishes >> filling in the loose ends I'd left before, such as documentation and >> outfuncs.c support. I think this is in a committable state now, though >> I plan to read through the whole thing again. The extra read-through located some minor bugs, mainly places where I'd forgotten to ensure that Path cost info was transposed into the generated Plan. That would only have the cosmetic effect that EXPLAIN would print zeroes for estimated costs, and since we only use EXPLAIN COSTS OFF in the regression tests, no test failures ensued :-(. I've pushed it now; we'll soon see if the buildfarm finds any problems. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
On Mon, Mar 7, 2016 at 11:09 AM, Tom Lane wrote: > Robert Haas writes: >> The currently-committed code generates paths where nested loops and >> hash joins get pushed beneath the Gather node, but does not generate >> paths where merge joins have been pushed beneath the Gather node. And >> the reason I didn't try to generate those paths is because I believe >> they will almost always suck. > > That's a perfectly reasonable engineering judgment (and especially so > for a first release). What I'd really like to see documented is how > that conclusion is related, or not, to the rules about how path nodes > should be decorated with parallel_safe, parallel_degree, etc annotations. > The existing documentation is barely adequate to explain what those fields > mean for primitive scan nodes; it's impossible for anyone but you to > know what they are supposed to mean for joins and higher-level nodes. It is unrelated, I think. If a path is parallel_safe, that is supposed to mean that, in theory, the plan generated from that path could be executed within a worker without crashing the server, giving wrong answers, or otherwise destroying the world. However, as an optimization, if we've already decided that the query can't ever be parallelized at all, for example because it contains write operations, we don't bother trying to set the parallel_safe flags correctly; they're just all false. Generally, a path is definitely not parallel_safe if it contains a path that is not parallel_safe; if all of the paths under it are parallel_safe, then it is also parallel_safe except when there's some unsafe computation added at the new level -- like an unsafe join qual between two safe relations. If a path is parallel_aware, that means that the plan generated by that path wants to do something different when run in parallel mode. Presumably, the difference will be that the plan will establish some shared state in the dynamic shared memory segment created to service that parallel query. For example, a sequential scan can be parallel_aware, which will allow that sequential scan to be simultaneously executed in multiple processes and return only a subset of the rows in each. A non-parallel_aware sequential scan can still be used in parallel mode; for example, consider this: Gather -> Hash Join -> Parallel Seq Scan -> Hash -> Seq Scan The outer seq scan needs to return each row only once across all workers, but the inner seq scan needs to return every row in every worker. Therefore, the outer seq scan is flagged parallel_aware and displays in the EXPLAIN output as "Parallel Seq Scan", while the inner one is not and does not. parallel_degree is a horrible kludge whose function is to communicate to the Gather node the number of workers for which it should budget. Currently, every parallel plan's leftmost descendent will be a Parallel Seq Scan, and that Parallel Seq Scan will estimate the degree of parallelism that makes sense using a simplistic, bone-headed algorithm based on the size of the table. That then bubbles up the plan tree to the Gather node, which adopts the Parallel Seq Scan's suggestion. I really hope this is going to go away eventually and be replaced by something better. Really, I think we should try to figure out the amount of parallelizable work (CPU, and effective I/O parallelism) that is going to be required per leftmost tuple and compare that to the amount of non-parallelizable work (presumably, the reset of the I/O cost) and use that to judge the optimal parallel degree. But I think that's going to take a lot of work to get right, and it ties into some other issues, like the fact that we estimate a scan of a 1MB table to have the same cost per page as a scan of a 10TB table even though the former should probably be assumed to be fully cached and the latter should probably be assumed not to be cached at all. I think a lot more thought is needed here than I've given it thus far, and one of the things that I'm hoping is that people will test parallel query and actually report the results so that we can accumulate some data on which problems are most important to go fix and, also, what the shape of those fixes might look like. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
On Mon, Mar 7, 2016 at 3:37 PM, Robert Haas wrote: > > The currently-committed code generates paths where nested loops and > hash joins get pushed beneath the Gather node, but does not generate > paths where merge joins have been pushed beneath the Gather node. And > the reason I didn't try to generate those paths is because I believe > they will almost always suck. As of now, what we know how to do is > build a partial path for a join by joining a partial path for the > outer input rel against an ordinary path for the inner rel. That > means that the work of generating the inner rel has to be redone in > each worker. That's not a problem if we've got something like a > nested loop with a parameterized inner index scan, because that sort > of plan redoes all the work for every row anyway. It is a problem for > a hash join, but it's not too hard for it to be worthwhile anyway if > the build table is small. For a merge join, though, it seems rather > unpromising. It's really doubtful that we want each worker to > independently sort the inner rel and then have them join their own > subset of the outer rel against their own copy of the sort. *Maybe* > it could win if the inner path is an index scan, but I wasn't really > sure that would come up and be a win often enough to be worth the cost > of generating the path. We tend to only use merge joins when both of > the relations involved are large, and index-scanning a large relation > tends to lose to sorting it. So it just seemed like a dead end. This is the first message on this subthread that actually gave me a feeling I understood the issue under discussion. It explains the distinction between plans that are parallel-safe and plans that would actually do something different under a parallel worker -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
Robert Haas writes: > The currently-committed code generates paths where nested loops and > hash joins get pushed beneath the Gather node, but does not generate > paths where merge joins have been pushed beneath the Gather node. And > the reason I didn't try to generate those paths is because I believe > they will almost always suck. That's a perfectly reasonable engineering judgment (and especially so for a first release). What I'd really like to see documented is how that conclusion is related, or not, to the rules about how path nodes should be decorated with parallel_safe, parallel_degree, etc annotations. The existing documentation is barely adequate to explain what those fields mean for primitive scan nodes; it's impossible for anyone but you to know what they are supposed to mean for joins and higher-level nodes. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
On Sun, Mar 6, 2016 at 10:32 AM, Tom Lane wrote: > Amit Kapila writes: >> On Sat, Mar 5, 2016 at 10:11 PM, Tom Lane wrote: >>> Is there some reason why hash and nestloop are safe but merge isn't? > >> I think it is because we consider to pushdown hash and nestloop to workers, >> but not merge join and the reason for not pushing mergejoin is that >> currently we don't have executor support for same, more work is needed >> there. > > If that's true, then mergejoin paths ought to be marked parallel-unsafe > explicitly (with a comment as to why), not just silently reduced to degree > zero in a manner that looks more like an oversight than anything > intentional. > > I also note that the regression tests pass with this patch and parallel > mode forced, which seems unlikely if allowing a parallel worker to execute > a join works for only two out of the three join types. And checking the > git history for nodeHashjoin.c, nodeHash.c, and nodeNestloop.c shows no > evidence that any of those files have been touched for parallel query, > so it's pretty hard to see a reason why those would work in parallel > queries but nodeMergejoin.c not. > > I still say the code as it stands is merely a copy-and-pasteo. I might call it a thinko rather than a copy-and-pasteo, but basically, you are right and Amit is wrong. I feel confident making that statement because I wrote the code, so I think I'm well-positioned to judge whether I did a particular thing on purpose or not. The currently-committed code generates paths where nested loops and hash joins get pushed beneath the Gather node, but does not generate paths where merge joins have been pushed beneath the Gather node. And the reason I didn't try to generate those paths is because I believe they will almost always suck. As of now, what we know how to do is build a partial path for a join by joining a partial path for the outer input rel against an ordinary path for the inner rel. That means that the work of generating the inner rel has to be redone in each worker. That's not a problem if we've got something like a nested loop with a parameterized inner index scan, because that sort of plan redoes all the work for every row anyway. It is a problem for a hash join, but it's not too hard for it to be worthwhile anyway if the build table is small. For a merge join, though, it seems rather unpromising. It's really doubtful that we want each worker to independently sort the inner rel and then have them join their own subset of the outer rel against their own copy of the sort. *Maybe* it could win if the inner path is an index scan, but I wasn't really sure that would come up and be a win often enough to be worth the cost of generating the path. We tend to only use merge joins when both of the relations involved are large, and index-scanning a large relation tends to lose to sorting it. So it just seemed like a dead end. Now, if somebody comes along with a patch to create partial merge join paths and shows that it improves performance on some class of queries I haven't thought about, I am not going to complain. And in the long run, I would like to have a facility to partition both relations on the fly and then have the workers do a merge join per partition. That's one of the two standard algorithms in the literature for parallel join - hash join is the other. I'm quite certain that's an important piece of technology to develop, but it's a huge project unto itself. My priority for 9.6 is to have a user-visible feature that takes the infrastructure that we already have as far as it reasonably can go. Building new infrastructure will have to wait for a future release. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
On Mon, Mar 7, 2016 at 11:52 AM, Peter Geoghegan wrote: > > On Sun, Mar 6, 2016 at 9:59 PM, Tom Lane wrote: > > Perhaps it was intentional when written, but if Robert's advice is correct > > that the new upper-planner path nodes should copy up parallel_degree from > > their children, then it cannot be the case that parallel_degree>0 in a > > node above the scan level implies that that node type has any special > > behavior for parallelism. > > Right. > > > I continue to bemoan the lack of documentation about what these fields > > mean. > > Point taken and if Robert doesn't feel otherwise, I can try to write a patch to explain the newly added fields. > > As far as I can find, the sum total of the documentation about > > this field is > > > > int parallel_degree; /* desired parallel degree; 0 = not parallel */ > > While it doesn't particularly relate to parallel joins, I've expressed > a general concern about the max_parallel_degree GUC that I think is > worth considering again: > > http://www.postgresql.org/message-id/cam3swzrs1mtvrkkasy1xbshgzxkd6-hnxx3gq7x-p-dz0zt...@mail.gmail.com > > In summary, I think it's surprising that a max_parallel_degree of 1 > doesn't disable parallel workers entirely. > I have responded on the thread where you have raised that point with my thoughts, discussing it here on a separate point can dilute the purpose of this thread. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] WIP: Upper planner pathification
Peter Geoghegan writes: > In summary, I think it's surprising that a max_parallel_degree of 1 > doesn't disable parallel workers entirely. Yeah, it's not exactly clear what "1" should mean that's different from "0", in this case. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
On Sun, Mar 6, 2016 at 9:59 PM, Tom Lane wrote: > Perhaps it was intentional when written, but if Robert's advice is correct > that the new upper-planner path nodes should copy up parallel_degree from > their children, then it cannot be the case that parallel_degree>0 in a > node above the scan level implies that that node type has any special > behavior for parallelism. > > I continue to bemoan the lack of documentation about what these fields > mean. As far as I can find, the sum total of the documentation about > this field is > > int parallel_degree; /* desired parallel degree; 0 = not parallel > */ While it doesn't particularly relate to parallel joins, I've expressed a general concern about the max_parallel_degree GUC that I think is worth considering again: http://www.postgresql.org/message-id/cam3swzrs1mtvrkkasy1xbshgzxkd6-hnxx3gq7x-p-dz0zt...@mail.gmail.com In summary, I think it's surprising that a max_parallel_degree of 1 doesn't disable parallel workers entirely. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
Amit Kapila writes: > On Mon, Mar 7, 2016 at 9:13 AM, Tom Lane wrote: >> AFAICS, those are about generating partial paths, which is a completely >> different thing from whether a regular path is parallel-safe or not. > Okay, but the main point I wanted to convey is that I think setting > parallel_degree = 0 in mergejoin path is not necessarily a copy-paste > error. Perhaps it was intentional when written, but if Robert's advice is correct that the new upper-planner path nodes should copy up parallel_degree from their children, then it cannot be the case that parallel_degree>0 in a node above the scan level implies that that node type has any special behavior for parallelism. I continue to bemoan the lack of documentation about what these fields mean. As far as I can find, the sum total of the documentation about this field is int parallel_degree; /* desired parallel degree; 0 = not parallel */ Last I checked, "degree" meant 1/360'th part of a circle, or some fraction of the distance between water's freezing and boiling points, or possibly an award for academic achievement. So I'm not really going to hold still for any claim that this is self-explanatory. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
On Mon, Mar 7, 2016 at 9:13 AM, Tom Lane wrote: > > Amit Kapila writes: > Is there some reason why hash and nestloop are safe but merge isn't? > > > To make hash and nestloop work in parallel queries, we just push those > > nodes below gather node. Refer code > > paths match_unsorted_outer()->consider_parallel_nestloop() > > and hash_inner_and_outer()->try_partial_hashjoin_path(). > > AFAICS, those are about generating partial paths, which is a completely > different thing from whether a regular path is parallel-safe or not. > Okay, but the main point I wanted to convey is that I think setting parallel_degree = 0 in mergejoin path is not necessarily a copy-paste error. If you see the other path generation code like create_index_path(), create_samplescan_path(), etc. there we set parallel_degree to zero even though the parallel-safety is determined based on reloptinfo. And I don't see any use of setting parallel_degree for path which can't be pushed beneath gather (aka can be executed by multiple workers). With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] WIP: Upper planner pathification
Amit Kapila writes: Is there some reason why hash and nestloop are safe but merge isn't? > To make hash and nestloop work in parallel queries, we just push those > nodes below gather node. Refer code > paths match_unsorted_outer()->consider_parallel_nestloop() > and hash_inner_and_outer()->try_partial_hashjoin_path(). AFAICS, those are about generating partial paths, which is a completely different thing from whether a regular path is parallel-safe or not. (I think, anyway. It would be nice if this stuff were documented better. It would also likely be a good thing if partial-ness of a path were marked in the path itself, which does not seem to be the case now. Or at the very least, it'd be a good thing if create_foo_path and the underlying costing functions were told it was a partial path, because how the heck can they generate sane cost numbers without that knowledge?) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
On Sun, Mar 6, 2016 at 9:02 PM, Tom Lane wrote: > > Amit Kapila writes: > > On Sat, Mar 5, 2016 at 10:11 PM, Tom Lane wrote: > >> Is there some reason why hash and nestloop are safe but merge isn't? > > > I think it is because we consider to pushdown hash and nestloop to workers, > > but not merge join and the reason for not pushing mergejoin is that > > currently we don't have executor support for same, more work is needed > > there. > > If that's true, then mergejoin paths ought to be marked parallel-unsafe > explicitly (with a comment as to why), not just silently reduced to degree > zero in a manner that looks more like an oversight than anything > intentional. > > I also note that the regression tests pass with this patch and parallel > mode forced, which seems unlikely if allowing a parallel worker to execute > a join works for only two out of the three join types. And checking the > git history for nodeHashjoin.c, nodeHash.c, and nodeNestloop.c shows no > evidence that any of those files have been touched for parallel query, > so it's pretty hard to see a reason why those would work in parallel > queries but nodeMergejoin.c not. > To make hash and nestloop work in parallel queries, we just push those nodes below gather node. Refer code paths match_unsorted_outer()->consider_parallel_nestloop() and hash_inner_and_outer()->try_partial_hashjoin_path(). Once the partial path for hash and nestloop gets generated in above code path, we generate gather path in function add_paths_to_joinrel()->generate_gather_paths(). You won't find the code to generate partial path for merge join. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] WIP: Upper planner pathification
Amit Kapila writes: > On Sat, Mar 5, 2016 at 10:11 PM, Tom Lane wrote: >> Is there some reason why hash and nestloop are safe but merge isn't? > I think it is because we consider to pushdown hash and nestloop to workers, > but not merge join and the reason for not pushing mergejoin is that > currently we don't have executor support for same, more work is needed > there. If that's true, then mergejoin paths ought to be marked parallel-unsafe explicitly (with a comment as to why), not just silently reduced to degree zero in a manner that looks more like an oversight than anything intentional. I also note that the regression tests pass with this patch and parallel mode forced, which seems unlikely if allowing a parallel worker to execute a join works for only two out of the three join types. And checking the git history for nodeHashjoin.c, nodeHash.c, and nodeNestloop.c shows no evidence that any of those files have been touched for parallel query, so it's pretty hard to see a reason why those would work in parallel queries but nodeMergejoin.c not. I still say the code as it stands is merely a copy-and-pasteo. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
On Sat, Mar 5, 2016 at 10:11 PM, Tom Lane wrote: > > Amit Kapila writes: > > On Fri, Mar 4, 2016 at 11:31 PM, Tom Lane wrote: > >> (BTW, I found what seemed to be a couple of pre-existing bugs of > >> the same kind, eg create_mergejoin_path was different from the > >> other two kinds of join as to setting parallel_degree.) > > > I think the reason for keeping parallel_degree as zero for mergejoin path > > is that currently it can't participate in parallelism. > > Is there some reason why hash and nestloop are safe but merge isn't? > I think it is because we consider to pushdown hash and nestloop to workers, but not merge join and the reason for not pushing mergejoin is that currently we don't have executor support for same, more work is needed there. I think even if we set parallel_degree as you are doing in patch for merge join is harmless, but ideally there is no need to set it as far as what we support today in terms of parallelism. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] WIP: Upper planner pathification
I wrote: > Attached is a version that addresses today's concerns, and also finishes > filling in the loose ends I'd left before, such as documentation and > outfuncs.c support. I think this is in a committable state now, though > I plan to read through the whole thing again. Sigh, forgot to press the magic button. Let's try that again. regards, tom lane upper-planner-pathification-3.patch.gz Description: upper-planner-pathification-3.patch.gz -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
Attached is a version that addresses today's concerns, and also finishes filling in the loose ends I'd left before, such as documentation and outfuncs.c support. I think this is in a committable state now, though I plan to read through the whole thing again. regards, tom lane #application/x-gzip; name="upper-planner-pathification-3.patch.gz" [upper-planner-pathification-3.patch.gz] /home/tgl/pgsql/upper-planner-pathification-3.patch.gz -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
I wrote: > Amit Kapila writes: >> I think here we should use rel->consider_parallel to set parallel_safe as >> is done in create_mergejoin_path. > Well, the "rel" is going to be an upperrel that will have been > manufactured by fetch_upper_rel, and it will contain no useful > information about parallelism, so I'm not real sure what that > would buy. Ah, after further study I found where this issue is handled for joinrels. I think you're probably right that it'd be a good idea to include rel->consider_parallel when setting parallel_safe in upper paths. In the short term that will have the effect of marking all upper paths as parallel-unsafe, but that's at least a safe default that we can improve later. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
Greg Stark writes: > What query is that lone data point that took 8ms instead of 6ms to > plan in both charts (assuming it's the same data point)? Ah, sorry, I should probably have spent a little more time on making those charts. That thing you're looking at isn't a data point, it's gnuplot showing what symbol it used for the data from this file. Here's another one with the axes adjusted to keep the labels away from the data, and with both sets of data overlaid. This makes it a bit clearer that the UPPEREL_INITIAL removal moved the distribution slightly but measurably at the bottom of the time range. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
On Sat, Mar 5, 2016 at 6:09 PM, Tom Lane wrote: > There might be some other things we could do to provide a fast-path for > particularly trivial cases. But on the whole I think this plot shows that > there's no systematic problem, and indeed not really a lot of change at > all. Amazing data. What query is that lone data point that took 8ms instead of 6ms to plan in both charts (assuming it's the same data point)? -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
I wrote: > Robert Haas writes: >> One idea might be to run a whole bunch of queries and record all of >> the planning times, and then run them all again and compare somehow. >> Maybe the regression tests, for example. > That sounds like something we could do pretty easily, though interpreting > the results might be nontrivial. I spent some time on this project. I modified the code to log the runtime of standard_planner along with decompiled text of the passed-in query tree. I then ran the regression tests ten times with cassert-off builds of both current HEAD and HEAD+pathification patch, and grouped all the numbers for log entries with identical texts. (FYI, there are around 1 distinguishable queries in the current tests, most planned only once or twice, but some as many as 2900 times.) I had intended to look at the averages within each group, but that was awfully noisy; I ended up looking at the minimum times, after discarding a few groups with particularly awful standard deviations. I theorize that a substantial part of the variation in the runtime depends on whether catalog entries consulted by the planner have been sucked into syscache or not, and thus that using the minimum is a reasonable way to eliminate cache-loading effects, which surely ought not be considered in this comparison. Here is a scatter plot, on log axes, of planning times in milliseconds with HEAD (x axis) vs those with patch (y axis): The most noticeable thing about that is that the worst percentage-wise cases appear near the bottom end of the range. And indeed inspection of individual entries showed that trivial cases like SELECT (ROW(1, 2) < ROW(1, 3)) AS "true" were hurting the most percentage-wise. After some study I decided that the only thing that could explain that was the two rounds of construct-an-upper-rel-and-add-paths-to-it happening in grouping_planner. I was able to get rid of one of them by discarding the notion of UPPERREL_INITIAL altogether, and instead having the code apply the desired tlist in-place, like this: sub_target = make_subplanTargetList(root, tlist, &groupColIdx); /* * Forcibly apply that tlist to all the Paths for the scan/join rel. * * In principle we should re-run set_cheapest() here to identify the * cheapest path, but it seems unlikely that adding the same tlist * eval costs to all the paths would change that, so we don't bother. * Instead, just assume that the cheapest-startup and cheapest-total * paths remain so. (There should be no parameterized paths anymore, * so we needn't worry about updating cheapest_parameterized_paths.) */ foreach(lc, current_rel->pathlist) { Path *subpath = (Path *) lfirst(lc); Path *path; Assert(subpath->param_info == NULL); path = apply_projection_to_path(root, current_rel, subpath, sub_target); /* If we had to add a Result, path is different from subpath */ if (path != subpath) { lfirst(lc) = path; if (subpath == current_rel->cheapest_startup_path) current_rel->cheapest_startup_path = path; if (subpath == current_rel->cheapest_total_path) current_rel->cheapest_total_path = path; } } With that fixed, the scatter plot looks like: There might be some other things we could do to provide a fast-path for particularly trivial cases. But on the whole I think this plot shows that there's no systematic problem, and indeed not really a lot of change at all. I won't bother to repost the modified patch right now, but will spend some time filling in the missing pieces first. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
Amit Kapila writes: > On Fri, Mar 4, 2016 at 11:31 PM, Tom Lane wrote: >> (BTW, I found what seemed to be a couple of pre-existing bugs of >> the same kind, eg create_mergejoin_path was different from the >> other two kinds of join as to setting parallel_degree.) > I think the reason for keeping parallel_degree as zero for mergejoin path > is that currently it can't participate in parallelism. Is there some reason why hash and nestloop are safe but merge isn't? >> + RecursiveUnionPath * >> + create_recursiveunion_path(PlannerInfo *root, >> + ... >> + pathnode->path.parallel_safe = >> + leftpath->parallel_safe && rightpath->parallel_safe; > I think here we should use rel->consider_parallel to set parallel_safe as > is done in create_mergejoin_path. Well, the "rel" is going to be an upperrel that will have been manufactured by fetch_upper_rel, and it will contain no useful information about parallelism, so I'm not real sure what that would buy. This does bring up what seems to me probably a pre-existing bug in the parallel query planning stuff: what about parallel-safe vs parallel-unsafe functions in join quals, or other expressions that have to be evaluated at places above the scan level? I would expect to see upper path nodes needing to account for parallel-safety of the specific expressions they need to execute. However, the existing join path node types don't have any provision for this, so I did not feel that it was incumbent on me to fix it for the path node types I'm adding. > + * It's only needed atop a node that doesn't support projection > "needed atop a node", seems unclear to me, typo? Seems perfectly good English to me. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
On Sat, Mar 5, 2016 at 4:51 PM, Amit Kapila wrote: > > On Fri, Mar 4, 2016 at 11:31 PM, Tom Lane wrote: > > > > OK, here is a version that I think addresses all of the recent comments: > > > > * Fixed handling of parallel-query fields in new path node types. > > (BTW, I found what seemed to be a couple of pre-existing bugs of > > the same kind, eg create_mergejoin_path was different from the > > other two kinds of join as to setting parallel_degree.) > > > > I think the reason for keeping parallel_degree as zero for mergejoin path is that currently it can't participate in parallelism. > > > *** create_unique_path(PlannerInfo *root, Re > *** 1440,1446 > pathnode->path.param_info = subpath- > >param_info; > pathnode->path.parallel_aware = false; > pathnode->path.parallel_safe = subpath->parallel_safe; > ! > pathnode->path.parallel_degree = 0; > > /* > * Assume the output is unsorted, since we don't necessarily > have pathkeys > --- 1445,1451 > pathnode->path.param_info = subpath->param_info; > pathnode- > >path.parallel_aware = false; > pathnode->path.parallel_safe = subpath->parallel_safe; > ! pathnode- > >path.parallel_degree = subpath->parallel_degree; > > Similar to reason for merge join path, I think this should also be set to 0. > > Similarly for LimitPath, parallel_degree should be set to 0. > Oops, It seems Robert has made comment upthread that we should set parallel_degree from subpath except for write paths, so I think the above comment can be ignored. With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] WIP: Upper planner pathification
On Fri, Mar 4, 2016 at 11:31 PM, Tom Lane wrote: > > OK, here is a version that I think addresses all of the recent comments: > > * Fixed handling of parallel-query fields in new path node types. > (BTW, I found what seemed to be a couple of pre-existing bugs of > the same kind, eg create_mergejoin_path was different from the > other two kinds of join as to setting parallel_degree.) > I think the reason for keeping parallel_degree as zero for mergejoin path is that currently it can't participate in parallelism. *** create_unique_path(PlannerInfo *root, Re *** 1440,1446 pathnode->path.param_info = subpath- >param_info; pathnode->path.parallel_aware = false; pathnode->path.parallel_safe = subpath->parallel_safe; ! pathnode->path.parallel_degree = 0; /* * Assume the output is unsorted, since we don't necessarily have pathkeys --- 1445,1451 pathnode->path.param_info = subpath->param_info; pathnode- >path.parallel_aware = false; pathnode->path.parallel_safe = subpath->parallel_safe; ! pathnode- >path.parallel_degree = subpath->parallel_degree; Similar to reason for merge join path, I think this should also be set to 0. Similarly for LimitPath, parallel_degree should be set to 0. + RecursiveUnionPath * + create_recursiveunion_path(PlannerInfo *root, + RelOptInfo *rel, + Path *leftpath, + Path *rightpath, + PathTarget *target, + List *distinctList, + int wtParam, + double numGroups) + { + RecursiveUnionPath *pathnode = makeNode(RecursiveUnionPath); + + pathnode->path.pathtype = T_RecursiveUnion; + pathnode->path.parent = rel; + pathnode->path.pathtarget = target; + /* For now, assume we are above any joins, so no parameterization */ + pathnode->path.param_info = NULL; + pathnode->path.parallel_aware = false; + pathnode->path.parallel_safe = + leftpath->parallel_safe && rightpath->parallel_safe; I think here we should use rel->consider_parallel to set parallel_safe as is done in create_mergejoin_path. + * It's only needed atop a node that doesn't support projection "needed atop a node", seems unclear to me, typo? With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] WIP: Upper planner pathification
OK, here is a version that I think addresses all of the recent comments: * I refactored the grouping-sets stuff as suggested by Robert and David. The GroupingSetsPath code is now used *only* when there are grouping sets, otherwise what you get is a plain AGG_SORTED AggPath. This allowed removal of a boatload of weird corner cases in the GroupingSets code path, so it was a good change. (Fundamentally, that's cleaning up some questionable coding in the grouping sets patch rather than fixing anything directly related to pathification, but I like the code better now.) * I refactored the handling of targetlists in createplan.c. After some reflection I decided that the disuse_physical_tlist callers fell into three separate categories: those that actually needed the exact requested tlist to be returned, those that wanted non-bloated tuples because they were going to put them into sort or hash storage, and those that needed grouping columns to be properly labeled. The new approach is to pass down a "flags" word that specifies which if any of these cases apply at a specific plan level. use_physical_tlist now always makes the right decision to start with, and disuse_physical_tlist is gone entirely, which should make things a bit faster since we won't uselessly construct and discard physical tlists. The missing logic from make_subplanTargetList and locate_grouping_columns is reincarnated in the physical-tlist code. * Added explicit limit/offset fields to LimitPath, as requested by Teodor. * Removed SortPath.sortgroupclauses. * Fixed handling of parallel-query fields in new path node types. (BTW, I found what seemed to be a couple of pre-existing bugs of the same kind, eg create_mergejoin_path was different from the other two kinds of join as to setting parallel_degree.) What remains to be done, IMV: * Performance testing as per yesterday's discussion. * Debug support in outfuncs.c and print_path() for new node types. * Clean up unfinished work on function header comments. * Write some documentation about how FDWs might use this. I'll work on the performance testing next. Barring unsatisfactory results from that, I think this could be committable in a couple of days. regards, tom lane upper-planner-pathification-2.patch.gz Description: upper-planner-pathification-2.patch.gz -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
On Tue, Mar 1, 2016 at 3:02 PM, Tom Lane wrote: > Well, my point is that no such path would have been generated if the > subquery hadn't had an internal reason to consider sorting on b.id. > The "accidental" part of this is that the subquery's GROUP BY key > matches what the outer query needs as a mergejoin key. Hm. I can't seem to get it to generate such plans here. This is after disabling hashjoin or else it doesn't want to do a sort at all: postgres=# explain select * from (select * from v group by i) as v1 natural join (select * from v group by i) as v2; QUERY PLAN --- Merge Join (cost=107.04..111.04 rows=200 width=4) Merge Cond: (v.i = v_1.i) -> Sort (cost=53.52..54.02 rows=200 width=4) Sort Key: v.i -> HashAggregate (cost=41.88..43.88 rows=200 width=4) Group Key: v.i -> Seq Scan on v (cost=0.00..35.50 rows=2550 width=4) -> Sort (cost=53.52..54.02 rows=200 width=4) Sort Key: v_1.i -> HashAggregate (cost=41.88..43.88 rows=200 width=4) Group Key: v_1.i -> Seq Scan on v v_1 (cost=0.00..35.50 rows=2550 width=4) (12 rows) I'm trying to construct a torture case where it generates lots more paths than HEAD. I don't think a percent or two on planning time is significant but if there are cases where the planning time increases quickly that would be something to code against. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
Hi, On 03/03/2016 01:10 AM, Tom Lane wrote: Hmmm ... I'm now wondering about the "measurement error" theory. I tried to repeat this measurement locally, focusing on the select-only number since that should have a higher ratio of planning time to execution. Test setup: cassert-off build pgbench -i -s 100 sudo cpupower frequency-set --governor performance repeat 3 times: pgbench -c 4 -j 4 -P 5 -T 60 -S HEAD: tps = 32508.217002 (excluding connections establishing) tps = 33081.402766 tps = 32520.859913 average of 3: 32703 tps WITH PATCH: tps = 32815.922160 (excluding connections establishing) tps = 33312.149718 tps = 32784.527489 average of 3: 32970 tps (Hardware: dual quad-core Xeon E5-2609, running current RHEL6) So I see no evidence for a slowdown on pgbench's SELECT queries. Anybody else want to check performance on simple scan/join queries? I did a small test today, mostly out of curiosity. And I think that while the results are a bit noisy, there's a clear slowdown. But it's extremely small, like ~0.5% for median/average, so I'd say it's negligible. I used the i5-2500k machine I use for this kind of tests, and I did 30 runs of pgbench -S -T 60 pgbench on scale 10 database (analyzed and frozen before), both with and without the patch applied. FWIW the machine is one of the least noisy ones when it comes to such benchmarks. The results look like this: masterpatcheddiff --- median16531 16459 -0.4% average 16526 16473 -0.3% It's a bit more obvious when doing a scatter plot of the results (after sorting each time series) with master on x-axis and patched on y-axis. Ideally, we'd get the data points on a diagonal (no change) or above it (speedup), but the points are actually below. See the chart attached. But I do agree this is mostly at the noise level, pretty good for a first cut that intentionally does not include any hacks. It's definitely way below the benefits of this patch, +1 to applying this sooner than later. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
On 4 March 2016 at 09:29, Robert Haas wrote: > On Thu, Mar 3, 2016 at 2:19 PM, Tom Lane wrote: >> This leads me to the conclusion that all these new node types should >> set parallel_aware to false and copy up the other two fields from the >> child, except for LockRows and ModifyTable which should set them all >> to false/0. Correct? If so, I'll go fix. > > Well, I'd probably bubble it up regardless. The fact that the overall > plan writes data will cause everything in the plan to have > parallel_safe = false and parallel_degree = 0, so you'll get the same > outcome either way. However, that way, if writes eventually become > safe, then this won't need adjusting. But it doesn't really matter > much; feel free to do it as you say if you prefer. This would help me too. I hit a problem with this when adding Group Parallel Aggregate, as the sort path is conditionally added in create_grouping_paths() which causes the parallel_degree for the subpath which is later passed into create_agg_path() to become 0. in create_agg_path() I was using the parallel_degree variable to determine if I should construct a normal serial agg path, or a chain of partial agg -> gather -> final agg paths. To get around the problem I added a parallel_degree parameter to create_agg_path() of which that parameter so far seems to only exist in create_seqscan_path(), which naturally is a bit special as it's a "leaf node" in the path tree. Propagating these would mean I could remove that parameter again. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
Robert Haas writes: > On Thu, Mar 3, 2016 at 2:19 PM, Tom Lane wrote: >> Thanks. As I told Teodor last night, I can't reproduce a performance >> issue here with pgbench-style queries. Do you have any thoughts about >> how we might satisfy ourselves whether there is or isn't a performance >> problem? > One idea might be to run a whole bunch of queries and record all of > the planning times, and then run them all again and compare somehow. > Maybe the regression tests, for example. That sounds like something we could do pretty easily, though interpreting the results might be nontrivial. There will, I expect, be a mix of queries that do get slower as well as those that don't. As long as the former are just queries that we hope to get some plan-quality win on, I think that's an acceptable result ... but we'll need to record enough data to tell what we're looking at. For starters, I'll try logging whether the query has setops, grouping, aggregation, window functions, and try to measure planning time change in each of those categories. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
On Thu, Mar 3, 2016 at 2:19 PM, Tom Lane wrote: > Robert Haas writes: >> Thanks for working on this. Some review comments: > >> - I think all of the new path creation functions should bubble up >> parallel_degree from their subpath. > > Ah, thanks, I didn't have any clue how to set that (though in my defense, > the documentation about it is next to nonexistent). Just to confirm, > I assume the rules are: > > * parallel_aware: indicates whether the plan node itself has any > parallelism behavior > > * parallel_safe: indicates that the entire plan tree rooted at this > node is safe to execute in a parallel worker > > * parallel_degree: indicates number of parallel threads potentially > useful for this plan tree (0 if not parallel-safe) Right. > This leads me to the conclusion that all these new node types should > set parallel_aware to false and copy up the other two fields from the > child, except for LockRows and ModifyTable which should set them all > to false/0. Correct? If so, I'll go fix. Well, I'd probably bubble it up regardless. The fact that the overall plan writes data will cause everything in the plan to have parallel_safe = false and parallel_degree = 0, so you'll get the same outcome either way. However, that way, if writes eventually become safe, then this won't need adjusting. But it doesn't really matter much; feel free to do it as you say if you prefer. >> - RollupPath seems like a poor choice of name, if nothing else. You >> would expect that it would be related to GROUP BY ROLLUP() but of >> course that's really the same thing as GROUP BY GROUPING SETS () or >> GROUP BY CUBE (), and the fundamental operation is actually GROUPING >> SETS, not ROLLUP. > > As I noted to David, that thing seems to me to be in need of refactoring, > but I'd just as soon leave untangling the grouping-sets mess for later. > I don't mind substituting a different name if you have a better idea, > but don't really want to do more work than that right now. Seems reasonable. GroupingSetsPath? MultipleGroupingPath? RepeatedGroupingPath? >> - A related point that is more connected to this patch is that you've >> added 13 (!) new calls to disuse_physical_tlist, and 8 of those are >> marked with /* XXX hack: need original tlist with sortgroupref marking >> */. I don't quite understand what's going on there. I think if we're >> going to be stuck with that hack we at least need some comments >> explaining what is going on there. What has caused us to suddenly >> need these calls when we didn't before, and why these places and not >> some others? > > Yeah, that's a hack to get things working. The problem is that these node > types need to be able to identify sort/group columns in their inputs, but > if the child has switched to a "physical tlist" then the ressortgroupref > marking isn't there, and indeed the needed column might not be there at > all if it's a computed expression not a Var. So what I did for the moment > was to force the inputs back to their nominal tlists. In the old code we > didn't have this problem because none of the upper-level plan node types > could see a physical tlist unless make_subplanTargetList had allowed it, > and then we applied locate_grouping_columns() to re-identify the grouping > columns. That logic probably needs to be transposed into createplan.c, > but I've not taken the time yet to figure out exactly how. I don't know > if it's reasonable to do that separately from rethinking how the whole > disuse_physical_tlist thing works. I don't know either, but it doesn't seem good to let this linger too long. >> - For SortPath, you mention that the SortGroupClauses representation >> isn't currently used. It's not clear to me that it ever would be; >> what would be the case where that might be useful? At any rate, I'd >> be inclined to rip it out for now; you can always put it back later. > > Yeah, I was dithering about that. It seems like createplan.c now has > a few too many ways to identify sort/group columns, and I was hoping > to consolidate them somehow. That might lead to wanting to use > SortGroupClauses not PathKeys in some cases. But until that's worked > out, I agree the extra field is useless and we can just drop it. OK. >> - create_distinct_paths() disables the hash path if it seems like it >> would exceed work_mem, unless the sort path isn't viable. But there's >> something that feels a bit uncomfortable about this. Suppose the sort >> path isn't viable but some other kind of future path is viable. It >> seems like it would be better to restructure this a bit so that the >> decision about whether to add the hash path is based on whether there >> are any other paths in the rel when we reach the bottom of the >> function. create_grouping_paths() has a similar issue. > > OK, I'll take a look. Quite a lot of these functions can probably stand > more local rearrangements; I've been mainly concerned about getting the > overall structure right. Un
Re: [HACKERS] WIP: Upper planner pathification
Robert Haas writes: > Thanks for working on this. Some review comments: > - I think all of the new path creation functions should bubble up > parallel_degree from their subpath. Ah, thanks, I didn't have any clue how to set that (though in my defense, the documentation about it is next to nonexistent). Just to confirm, I assume the rules are: * parallel_aware: indicates whether the plan node itself has any parallelism behavior * parallel_safe: indicates that the entire plan tree rooted at this node is safe to execute in a parallel worker * parallel_degree: indicates number of parallel threads potentially useful for this plan tree (0 if not parallel-safe) This leads me to the conclusion that all these new node types should set parallel_aware to false and copy up the other two fields from the child, except for LockRows and ModifyTable which should set them all to false/0. Correct? If so, I'll go fix. > - RollupPath seems like a poor choice of name, if nothing else. You > would expect that it would be related to GROUP BY ROLLUP() but of > course that's really the same thing as GROUP BY GROUPING SETS () or > GROUP BY CUBE (), and the fundamental operation is actually GROUPING > SETS, not ROLLUP. As I noted to David, that thing seems to me to be in need of refactoring, but I'd just as soon leave untangling the grouping-sets mess for later. I don't mind substituting a different name if you have a better idea, but don't really want to do more work than that right now. > - It's not entirely related to this patch, but I'm starting to wonder > if we've made the wrong bet about target lists. It seems to me that > there's a huge difference between a projection which simply throws > away columns we don't need and one which actually computes something, > and maybe those cases ought to be treated differently instead of > saying "well, it's a target list". It strikes me that there are > probably execution-time optimizations that are possible in the former > case, and maybe a more compact representation of the projection > operation as well. I can't shake the feeling that our extensive use > of lists can't be the best thing ever for performance. We do already have the "physical tlist" optimization. I agree that there's more work to be done here, but again would rather leave that to a later patch. > - A related point that is more connected to this patch is that you've > added 13 (!) new calls to disuse_physical_tlist, and 8 of those are > marked with /* XXX hack: need original tlist with sortgroupref marking > */. I don't quite understand what's going on there. I think if we're > going to be stuck with that hack we at least need some comments > explaining what is going on there. What has caused us to suddenly > need these calls when we didn't before, and why these places and not > some others? Yeah, that's a hack to get things working. The problem is that these node types need to be able to identify sort/group columns in their inputs, but if the child has switched to a "physical tlist" then the ressortgroupref marking isn't there, and indeed the needed column might not be there at all if it's a computed expression not a Var. So what I did for the moment was to force the inputs back to their nominal tlists. In the old code we didn't have this problem because none of the upper-level plan node types could see a physical tlist unless make_subplanTargetList had allowed it, and then we applied locate_grouping_columns() to re-identify the grouping columns. That logic probably needs to be transposed into createplan.c, but I've not taken the time yet to figure out exactly how. I don't know if it's reasonable to do that separately from rethinking how the whole disuse_physical_tlist thing works. > - For SortPath, you mention that the SortGroupClauses representation > isn't currently used. It's not clear to me that it ever would be; > what would be the case where that might be useful? At any rate, I'd > be inclined to rip it out for now; you can always put it back later. Yeah, I was dithering about that. It seems like createplan.c now has a few too many ways to identify sort/group columns, and I was hoping to consolidate them somehow. That might lead to wanting to use SortGroupClauses not PathKeys in some cases. But until that's worked out, I agree the extra field is useless and we can just drop it. > - create_distinct_paths() disables the hash path if it seems like it > would exceed work_mem, unless the sort path isn't viable. But there's > something that feels a bit uncomfortable about this. Suppose the sort > path isn't viable but some other kind of future path is viable. It > seems like it would be better to restructure this a bit so that the > decision about whether to add the hash path is based on whether there > are any other paths in the rel when we reach the bottom of the > function. create_grouping_paths() has a similar issue. OK, I'll take a look. Quite a lot of these functions ca
Re: [HACKERS] WIP: Upper planner pathification
On Sun, Feb 28, 2016 at 3:03 PM, Tom Lane wrote: > I'm pretty pleased with the way this turned out. grouping_planner() is > about half the length it was before, and much more straightforward IMO. > planagg.c no longer seems like a complete hack; it's a reasonable > prototype for injecting nontraditional implementation paths into > aggregation or other late planner stages, and grouping_planner() doesn't > need to know about it. Thanks for working on this. Some review comments: - I think all of the new path creation functions should bubble up parallel_degree from their subpath. It's fine for LockRows to use 0 for now, because we have no hope of supporting write operations in parallel mode any time soon, but it's one less thing to change later. The others should really all use the subpath's parallel degree. Actually, they should use the subpath's parallel degree or maybe a larger number if they add a lot of CPU work to the query, but don't have any principled way to model that right now, so just copying the value is probably as good as we're going to do for the moment. - RollupPath seems like a poor choice of name, if nothing else. You would expect that it would be related to GROUP BY ROLLUP() but of course that's really the same thing as GROUP BY GROUPING SETS () or GROUP BY CUBE (), and the fundamental operation is actually GROUPING SETS, not ROLLUP. - It's not entirely related to this patch, but I'm starting to wonder if we've made the wrong bet about target lists. It seems to me that there's a huge difference between a projection which simply throws away columns we don't need and one which actually computes something, and maybe those cases ought to be treated differently instead of saying "well, it's a target list". It strikes me that there are probably execution-time optimizations that are possible in the former case, and maybe a more compact representation of the projection operation as well. I can't shake the feeling that our extensive use of lists can't be the best thing ever for performance. - A related point that is more connected to this patch is that you've added 13 (!) new calls to disuse_physical_tlist, and 8 of those are marked with /* XXX hack: need original tlist with sortgroupref marking */. I don't quite understand what's going on there. I think if we're going to be stuck with that hack we at least need some comments explaining what is going on there. What has caused us to suddenly need these calls when we didn't before, and why these places and not some others? - For SortPath, you mention that the SortGroupClauses representation isn't currently used. It's not clear to me that it ever would be; what would be the case where that might be useful? At any rate, I'd be inclined to rip it out for now; you can always put it back later. - create_distinct_paths() disables the hash path if it seems like it would exceed work_mem, unless the sort path isn't viable. But there's something that feels a bit uncomfortable about this. Suppose the sort path isn't viable but some other kind of future path is viable. It seems like it would be better to restructure this a bit so that the decision about whether to add the hash path is based on whether there are any other paths in the rel when we reach the bottom of the function. create_grouping_paths() has a similar issue. In general, and I'm sure this is not a huge surprise, most of this looks very good to me. I think the design is sound and that, if the performance is OK, we ought to move forward with it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Upper planner pathification
David Rowley writes: > My gripe is that I've added the required code to build the parallel > group aggregate to create_agg_path() already, but since Group > Aggregate uses the RollupPath I'm forced to add code in > create_rollup_plan() which manually stacks up Plan nodes rather than > just dealing with Paths and create_plan() and its recursive call > magic. Yeah, RollupPath was something I did to be expeditious rather than something I'm particularly in love with. It's OK for a first version, I think, but we'd need to refactor it if we were to consider more than one implementation strategy for a rollup. Also it's pretty ugly that the code makes a RollupPath even when a basic AggPath is what is meant; that's a leftover from the fact that current HEAD goes through build_grouping_chain() even for simple aggregation without grouping sets. One point to consider is that we don't want the create_foo_path stage to do much more than what's necessary to get a cost/rows estimate. So in general, postponing as much work as possible to createplan.c is a good thing. But we don't want the Path representation to leave any interesting planning choices implicit. My general feeling about this is that I don't want it to be a blocker for getting the basic patch in, but I'll happily consider further refactoring of individual path types once we're over that hump. If you wanted to start on a follow-on patch to deal with this particular issue, be my guest. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers