Re: [HACKERS] Proposal : Parallel Merge Join
On Tue, Mar 7, 2017 at 10:28 PM, Robert Haaswrote: >> Apart from this, there was one problem in match_unsorted_outer (in >> v10), Basically, if inner_cheapest_total was not parallel_safe then I >> was always getting parallel safe inner. But, we should not do anything >> if jointype was JOIN_UNIQUE_INNER, so fixed that also. > > This version looks fine to me, so committed. Thank you. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal : Parallel Merge Join
On Tue, Mar 7, 2017 at 11:38 AM, Dilip Kumarwrote: > On Tue, Mar 7, 2017 at 7:47 PM, Robert Haas wrote: >> You're right to be confused, because that seems to be a bug in the >> existing code. There seems to be no guarantee that the cheapest >> parallel-safe path will be in the cheapest_parameterized_paths list. >> I'll go fix that. > > Okay, Done the simmilar changes in sort_inner_and_outer as well. >> >> As a matter of style, when testing a value of type "bool", write if >> (x) or if (!x). When testing a variable of some other type, say int, >> write if (x == 0) or if (x != 0) or whatever. > > Done > > Apart from this, there was one problem in match_unsorted_outer (in > v10), Basically, if inner_cheapest_total was not parallel_safe then I > was always getting parallel safe inner. But, we should not do anything > if jointype was JOIN_UNIQUE_INNER, so fixed that also. This version looks fine to me, so committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal : Parallel Merge Join
On Tue, Mar 7, 2017 at 7:47 PM, Robert Haaswrote: > You're right to be confused, because that seems to be a bug in the > existing code. There seems to be no guarantee that the cheapest > parallel-safe path will be in the cheapest_parameterized_paths list. > I'll go fix that. Okay, Done the simmilar changes in sort_inner_and_outer as well. > > As a matter of style, when testing a value of type "bool", write if > (x) or if (!x). When testing a variable of some other type, say int, > write if (x == 0) or if (x != 0) or whatever. Done Apart from this, there was one problem in match_unsorted_outer (in v10), Basically, if inner_cheapest_total was not parallel_safe then I was always getting parallel safe inner. But, we should not do anything if jointype was JOIN_UNIQUE_INNER, so fixed that also. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com parallel_mergejoin_v11.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal : Parallel Merge Join
On Tue, Mar 7, 2017 at 5:16 AM, Dilip Kumarwrote: > I am confused about whether to call > "get_cheapest_parallel_safe_total_inner" with > innerrel->cheapest_parameterized_paths like we do in case of > hash_inner_and_outer or with > innerrel->pathlist. The reason behind I am calling this with complete > pathlist (innerrel->pathlist) is that inside generate_mergejoin_paths > it calls "get_cheapest_path_for_pathkeys" for complete pathlist and if > we decide not to call consider_parallel_mergejoin if the cheapest > parallel safe path in innerrel->cheapest_parameterized_paths is NULL > then it will not be fair (we might have some parallel safe paths in > innerrel->pathlist). You're right to be confused, because that seems to be a bug in the existing code. There seems to be no guarantee that the cheapest parallel-safe path will be in the cheapest_parameterized_paths list. I'll go fix that. As a matter of style, when testing a value of type "bool", write if (x) or if (!x). When testing a variable of some other type, say int, write if (x == 0) or if (x != 0) or whatever. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal : Parallel Merge Join
On Tue, Mar 7, 2017 at 5:21 AM, Robert Haaswrote: > +/* Can't do anything else if inner path needs to be unique'd */ > +if (save_jointype == JOIN_UNIQUE_INNER) > +return; > > Right after this, you should try_partial_mergejoin_path() with the > result of get_cheapest_parallel_safe_total_inner(), just like > sort_inner_and_outer() already does. > > I think with some work you could merge generate_mergejoin_paths with > generate_partial_mergejoin_paths instead of duplicating all the code. > They're not really that different, and you could reduce the > differences further if you made try_mergejoin_path() take an > additional is_partial argument and pass the call to > try_partial_mergejoin_path() if it's true. The various bits of > special handling related to join types that aren't supported in > parallel queries aren't going to help you in parallel mode, but they > won't hurt either. This bit: > > +/* Generate partial path if inner is parallel safe. */ > +if (inner_cheapest_total->parallel_safe) > +try_partial_mergejoin_path(root, > > ...is really not right. That value is being passed down from > consider_parallel_mergejoin(), which is getting it from > match_unsorted_outer(), which really shouldn't be calling this code in > the first place with a path that's not parallel-safe. What it ought > to do is (a) pass inner_cheapest_total if that's non-NULL and > parallel-safe, or else (b) call > get_cheapest_parallel_safe_total_inner() and pass that value, (c) > unless that's NULL also in which case it should skip the call > altogether. I have tried to address above comments in the latest patch, But there is one part of the patch which I am not quite sure yet. So still this patch is not a final one. + * + * If inner_cheapest_total is not NULL or not a parallel safe path then + * find the cheapest total parallel safe path. + */ + if (inner_cheapest_total == NULL || + inner_cheapest_total->parallel_safe == false) + { + inner_cheapest_total = + get_cheapest_parallel_safe_total_inner(innerrel->pathlist); + } + + if (inner_cheapest_total) + consider_parallel_mergejoin(root, joinrel, outerrel, innerrel, + save_jointype, extra, + inner_cheapest_total); I am confused about whether to call "get_cheapest_parallel_safe_total_inner" with innerrel->cheapest_parameterized_paths like we do in case of hash_inner_and_outer or with innerrel->pathlist. The reason behind I am calling this with complete pathlist (innerrel->pathlist) is that inside generate_mergejoin_paths it calls "get_cheapest_path_for_pathkeys" for complete pathlist and if we decide not to call consider_parallel_mergejoin if the cheapest parallel safe path in innerrel->cheapest_parameterized_paths is NULL then it will not be fair (we might have some parallel safe paths in innerrel->pathlist). -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com parallel_mergejoin_v10.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal : Parallel Merge Join
On Mon, Mar 6, 2017 at 8:15 AM, Dilip Kumarwrote: > On Fri, Mar 3, 2017 at 3:57 PM, Robert Haas wrote: >> I'm not happy with the way this patch can just happen to latch on to a >> path that's not parallel-safe rather than one that is and then just >> give up on a merge join in that case. I already made this argument in >> https://www.postgresql.org/message-id/ca+tgmobdw2au1jq5l4ysa2zhqfma-qnvd7zfazbjwm3c0ys...@mail.gmail.com >> and my opinion hasn't changed. I've subsequently come to realize that >> this problem is more widespread that I initially realized: not only do >> sort_inner_and_outer() and generate_partial_mergejoin_paths() give up >> without searching for the cheapest parallel-safe path, but the latter >> later calls get_cheapest_path_for_pathkeys() and then just pretends it >> didn't find anything unless the result is parallel-safe. This makes >> no sense to me: the cheapest parallel-safe path could be 1% more >> expensive than the cheapest path of any kind, but because it's not the >> cheapest we arbitrarily skip it, even though parallelizing more of the >> tree might leave us way ahead overall. > > for sort_inner_and_outer I have followed the same logic what > hash_inner_and_outer is doing. Also moved the logic of selecting the > cheapest_safe_inner out of the pathkey loop. +/* Generate partial path if inner is parallel safe. */ +if (inner_cheapest_total->parallel_safe) +try_partial_mergejoin_path(root, + joinrel, + outerpath, + inner_cheapest_total, + merge_pathkeys, + mergeclauses, + NIL, + innersortkeys, + jointype, + extra); + +/* Can't do anything else if inner path needs to be unique'd */ +if (save_jointype == JOIN_UNIQUE_INNER) +return; Right after this, you should try_partial_mergejoin_path() with the result of get_cheapest_parallel_safe_total_inner(), just like sort_inner_and_outer() already does. I think with some work you could merge generate_mergejoin_paths with generate_partial_mergejoin_paths instead of duplicating all the code. They're not really that different, and you could reduce the differences further if you made try_mergejoin_path() take an additional is_partial argument and pass the call to try_partial_mergejoin_path() if it's true. The various bits of special handling related to join types that aren't supported in parallel queries aren't going to help you in parallel mode, but they won't hurt either. This bit: +/* Generate partial path if inner is parallel safe. */ +if (inner_cheapest_total->parallel_safe) +try_partial_mergejoin_path(root, ...is really not right. That value is being passed down from consider_parallel_mergejoin(), which is getting it from match_unsorted_outer(), which really shouldn't be calling this code in the first place with a path that's not parallel-safe. What it ought to do is (a) pass inner_cheapest_total if that's non-NULL and parallel-safe, or else (b) call get_cheapest_parallel_safe_total_inner() and pass that value, (c) unless that's NULL also in which case it should skip the call altogether. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal : Parallel Merge Join
On Fri, Mar 3, 2017 at 3:57 PM, Robert Haaswrote: > I'm not happy with the way this patch can just happen to latch on to a > path that's not parallel-safe rather than one that is and then just > give up on a merge join in that case. I already made this argument in > https://www.postgresql.org/message-id/ca+tgmobdw2au1jq5l4ysa2zhqfma-qnvd7zfazbjwm3c0ys...@mail.gmail.com > and my opinion hasn't changed. I've subsequently come to realize that > this problem is more widespread that I initially realized: not only do > sort_inner_and_outer() and generate_partial_mergejoin_paths() give up > without searching for the cheapest parallel-safe path, but the latter > later calls get_cheapest_path_for_pathkeys() and then just pretends it > didn't find anything unless the result is parallel-safe. This makes > no sense to me: the cheapest parallel-safe path could be 1% more > expensive than the cheapest path of any kind, but because it's not the > cheapest we arbitrarily skip it, even though parallelizing more of the > tree might leave us way ahead overall. for sort_inner_and_outer I have followed the same logic what hash_inner_and_outer is doing. Also moved the logic of selecting the cheapest_safe_inner out of the pathkey loop. > > I suggest that we add an additional "bool require_parallel_safe" > argument to get_cheapest_path_for_pathkeys(); when false, the function > will behave as at present, but when true, it will skip paths that are > not parallel-safe. And similarly have a function to find the cheapest > parallel-safe path if the first one in the list isn't it. See > attached draft patch. Then this code can search for the correct path > instead of searching using the wrong criteria and then giving up if it > doesn't get the result it wants. Done as suggested. Also, rebased the path_search_for_mergejoin on head and updated the comments of get_cheapest_path_for_pathkeys for new argument. > > +if (!(joinrel->consider_parallel && > + save_jointype != JOIN_UNIQUE_OUTER && > + save_jointype != JOIN_FULL && > + save_jointype != JOIN_RIGHT && > + outerrel->partial_pathlist != NIL && > + bms_is_empty(joinrel->lateral_relids))) > > I would distribute the negation, so that this reads if > (!joinrel->consider_parallel || save_jointype == JOIN_UNIQUE_OUTER || > save_jointype == JOIN_FULL || ...). Or maybe better yet, just drop > the ! and the return that follows and put the > consider_parallel_nestloop and consider_parallel_mergejoin calls > inside the if-block. Done > > You could Assert(nestjoinOK) instead of testing it, although I guess > it doesn't really matter. left as it is. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com parallel_mergejoin_v9.patch Description: Binary data path-search-for-mergejoin-v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal : Parallel Merge Join
On Fri, Mar 3, 2017 at 9:47 PM, Dilip Kumarwrote: > On Fri, Mar 3, 2017 at 3:57 PM, Robert Haas wrote: >> I'm not happy with the way this patch can just happen to latch on to a >> path that's not parallel-safe rather than one that is and then just >> give up on a merge join in that case. I already made this argument in >> https://www.postgresql.org/message-id/ca+tgmobdw2au1jq5l4ysa2zhqfma-qnvd7zfazbjwm3c0ys...@mail.gmail.com >> and my opinion hasn't changed. > > I think last time I did not understand the depth of the problem > completely and only fixed from one aspect that in > generate_partial_mergejoin_paths if cheapest_total_inner or > cheapest_startup_inner is not parallel safe then consider the current > path if that are parallel safe and now I got it how it was completely > wrong. > > I have one question for fixing it in sort_inner_and_outer, Currently, > we don't consider the parameterized paths for merge join except the > case when cheapest total paths itself is parameterized, So IIUC, for > creating partial path we will check if cheapest_total_inner path is > not parallel safe then we will find cheapest inner parallel safe path > using your new API get_cheapest_parallel_safe_total_inner, and we will > proceed with this paths if this is not directly parameterized by > outer? Remember that partial paths can't be parameterized, and here we're trying to build a partial mergejoin path. The outer input obviously won't be parameterized, since it's partial, and it can't satisfy any parameterization of the inner relation either, because only nested loops can do that. So it only makes sense to try merge-joining a partial path against a completely unparameterized, parallel-safe inner path. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal : Parallel Merge Join
On Fri, Mar 3, 2017 at 3:57 PM, Robert Haaswrote: > I'm not happy with the way this patch can just happen to latch on to a > path that's not parallel-safe rather than one that is and then just > give up on a merge join in that case. I already made this argument in > https://www.postgresql.org/message-id/ca+tgmobdw2au1jq5l4ysa2zhqfma-qnvd7zfazbjwm3c0ys...@mail.gmail.com > and my opinion hasn't changed. I think last time I did not understand the depth of the problem completely and only fixed from one aspect that in generate_partial_mergejoin_paths if cheapest_total_inner or cheapest_startup_inner is not parallel safe then consider the current path if that are parallel safe and now I got it how it was completely wrong. I have one question for fixing it in sort_inner_and_outer, Currently, we don't consider the parameterized paths for merge join except the case when cheapest total paths itself is parameterized, So IIUC, for creating partial path we will check if cheapest_total_inner path is not parallel safe then we will find cheapest inner parallel safe path using your new API get_cheapest_parallel_safe_total_inner, and we will proceed with this paths if this is not directly parameterized by outer? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal : Parallel Merge Join
On Wed, Mar 1, 2017 at 12:01 PM, Amit Kapilawrote: > On Wed, Mar 1, 2017 at 11:24 AM, Dilip Kumar wrote: >> On Wed, Mar 1, 2017 at 11:13 AM, Amit Kapila wrote: >>> I think for now we can keep the parallel safety check for cheapest >>> inner path, though it will be of use only for the very first time we >>> compare the paths in that loop. I am not sure if there is any other >>> better way to handle the same. >> >> Done that way. > > Thanks, your patch looks good to me. I'm not happy with the way this patch can just happen to latch on to a path that's not parallel-safe rather than one that is and then just give up on a merge join in that case. I already made this argument in https://www.postgresql.org/message-id/ca+tgmobdw2au1jq5l4ysa2zhqfma-qnvd7zfazbjwm3c0ys...@mail.gmail.com and my opinion hasn't changed. I've subsequently come to realize that this problem is more widespread that I initially realized: not only do sort_inner_and_outer() and generate_partial_mergejoin_paths() give up without searching for the cheapest parallel-safe path, but the latter later calls get_cheapest_path_for_pathkeys() and then just pretends it didn't find anything unless the result is parallel-safe. This makes no sense to me: the cheapest parallel-safe path could be 1% more expensive than the cheapest path of any kind, but because it's not the cheapest we arbitrarily skip it, even though parallelizing more of the tree might leave us way ahead overall. I suggest that we add an additional "bool require_parallel_safe" argument to get_cheapest_path_for_pathkeys(); when false, the function will behave as at present, but when true, it will skip paths that are not parallel-safe. And similarly have a function to find the cheapest parallel-safe path if the first one in the list isn't it. See attached draft patch. Then this code can search for the correct path instead of searching using the wrong criteria and then giving up if it doesn't get the result it wants. +if (!(joinrel->consider_parallel && + save_jointype != JOIN_UNIQUE_OUTER && + save_jointype != JOIN_FULL && + save_jointype != JOIN_RIGHT && + outerrel->partial_pathlist != NIL && + bms_is_empty(joinrel->lateral_relids))) I would distribute the negation, so that this reads if (!joinrel->consider_parallel || save_jointype == JOIN_UNIQUE_OUTER || save_jointype == JOIN_FULL || ...). Or maybe better yet, just drop the ! and the return that follows and put the consider_parallel_nestloop and consider_parallel_mergejoin calls inside the if-block. You could Assert(nestjoinOK) instead of testing it, although I guess it doesn't really matter. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company path-search-for-mergejoin-v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal : Parallel Merge Join
On Wed, Mar 1, 2017 at 11:24 AM, Dilip Kumarwrote: > On Wed, Mar 1, 2017 at 11:13 AM, Amit Kapila wrote: >> I think for now we can keep the parallel safety check for cheapest >> inner path, though it will be of use only for the very first time we >> compare the paths in that loop. I am not sure if there is any other >> better way to handle the same. > > Done that way. > Thanks, your patch looks good to me. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal : Parallel Merge Join
On Wed, Mar 1, 2017 at 11:13 AM, Amit Kapilawrote: > I think for now we can keep the parallel safety check for cheapest > inner path, though it will be of use only for the very first time we > compare the paths in that loop. I am not sure if there is any other > better way to handle the same. Done that way. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com parallel_mergejoin_v8.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal : Parallel Merge Join
On Tue, Feb 28, 2017 at 9:28 PM, Dilip Kumarwrote: > On Mon, Feb 27, 2017 at 10:27 AM, Amit Kapila wrote: >> Okay, but in that case don't you think it is better to consider the >> parallel safety of cheapest_total_inner only when we don't find any >> cheap parallel_safe innerpath by reducing the sort keys? > > Well, we can do that but suppose cheapest_total_inner is not parallel > safe and we do not get any parallel safe path which is cheaper than > cheapest_total_inner, then we just end up making the merge join path > with the cheapest parallel safe path but we might have missed some of > the paths whose pathkey is covering more ordered keys. Still, it's > hard to argue what it better because we can always say that if we try > only cheapest parallel safe path we will generate fewer paths. > I think for now we can keep the parallel safety check for cheapest inner path, though it will be of use only for the very first time we compare the paths in that loop. I am not sure if there is any other better way to handle the same. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal : Parallel Merge Join
On Mon, Feb 27, 2017 at 10:27 AM, Amit Kapilawrote: > Okay, but in that case don't you think it is better to consider the > parallel safety of cheapest_total_inner only when we don't find any > cheap parallel_safe innerpath by reducing the sort keys? Well, we can do that but suppose cheapest_total_inner is not parallel safe and we do not get any parallel safe path which is cheaper than cheapest_total_inner, then we just end up making the merge join path with the cheapest parallel safe path but we might have missed some of the paths whose pathkey is covering more ordered keys. Still, it's hard to argue what it better because we can always say that if we try only cheapest parallel safe path we will generate fewer paths. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal : Parallel Merge Join
On Sun, Feb 26, 2017 at 12:01 PM, Robert Haaswrote: > On Fri, Feb 24, 2017 at 3:54 PM, Amit Kapila wrote: >> I agree in some cases it could be better, but I think benefits are not >> completely clear, so probably we can leave it as of now and if later >> any one comes with a clear use case or can see the benefits of such >> path then we can include it. > > TBH, I think Dilip had the right idea here. cheapest_total_inner > isn't anything magical; it's just that there's no reason to use > anything but the cheapest path for a relation when forming a join path > unless that first path lacks some property that you need, like having > the right sortkeys or being parallel-safe. Since there are lots of > join paths that just want to do things in the cheapest way possible, > we identify the cheapest path and hang on to it; when that's not what > we need, we go find the path or paths that have the properties we > want. I'm not sure why this case should be an exception. You could > argue that if the cheapest parallel-safe path is already more > expensive then the parallel join may not pay off, but that's hard to > say; it depends on what happens higher up in the plan tree. > Okay, but in that case don't you think it is better to consider the parallel safety of cheapest_total_inner only when we don't find any cheap parallel_safe innerpath by reducing the sort keys? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal : Parallel Merge Join
On Fri, Feb 24, 2017 at 3:54 PM, Amit Kapilawrote: > I agree in some cases it could be better, but I think benefits are not > completely clear, so probably we can leave it as of now and if later > any one comes with a clear use case or can see the benefits of such > path then we can include it. TBH, I think Dilip had the right idea here. cheapest_total_inner isn't anything magical; it's just that there's no reason to use anything but the cheapest path for a relation when forming a join path unless that first path lacks some property that you need, like having the right sortkeys or being parallel-safe. Since there are lots of join paths that just want to do things in the cheapest way possible, we identify the cheapest path and hang on to it; when that's not what we need, we go find the path or paths that have the properties we want. I'm not sure why this case should be an exception. You could argue that if the cheapest parallel-safe path is already more expensive then the parallel join may not pay off, but that's hard to say; it depends on what happens higher up in the plan tree. That's why the current code handles partial hash joins this way: /* * Normally, given that the joinrel is parallel-safe, the cheapest * total inner path will also be parallel-safe, but if not, we'll * have to search cheapest_parameterized_paths for the cheapest * safe, unparameterized inner path. If doing JOIN_UNIQUE_INNER, * we can't use any alternative inner path. */ if (cheapest_total_inner->parallel_safe) cheapest_safe_inner = cheapest_total_inner; else if (save_jointype != JOIN_UNIQUE_INNER) { ListCell *lc; foreach(lc, innerrel->cheapest_parameterized_paths) { Path *innerpath = (Path *) lfirst(lc); if (innerpath->parallel_safe && bms_is_empty(PATH_REQ_OUTER(innerpath))) { cheapest_safe_inner = innerpath; break; } } } I would tend to think this case ought to be handled in a similar way. And if not, then we ought to go change the hash join logic too so that they match. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal : Parallel Merge Join
On Sat, Feb 25, 2017 at 3:22 PM, Dilip Kumarwrote: > On Sat, Feb 25, 2017 at 11:29 AM, Amit Kapila wrote: >> It seems you have forgotten to change in the below check: >> >> + if (innerpath != NULL && >> + innerpath->parallel_safe && >> + (cheapest_startup_inner == NULL || >> + cheapest_startup_inner->parallel_safe == false || >> + compare_path_costs(innerpath, cheapest_startup_inner, >> + STARTUP_COST) < 0)) > > Oops, Fixed. > Thanks, patch looks good to me (apart from some of the code comments which might need some adjustment and I think it is better if Committer takes care of same as required rather than we argue about it) and I have changed the status as Ready For Committer. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal : Parallel Merge Join
On Sat, Feb 25, 2017 at 11:29 AM, Amit Kapilawrote: > It seems you have forgotten to change in the below check: > > + if (innerpath != NULL && > + innerpath->parallel_safe && > + (cheapest_startup_inner == NULL || > + cheapest_startup_inner->parallel_safe == false || > + compare_path_costs(innerpath, cheapest_startup_inner, > + STARTUP_COST) < 0)) Oops, Fixed. > > > + /* Found a cheap (or even-cheaper) sorted parallel safer path */ > > typo > /safer/safe Fixed > > Note - Change the patch status in CF app (to Needs Review) whenever > you post a new patch. I could see that your other patch also needs a > similar update in CF app. Will changes, Thanks. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com parallel_mergejoin_v7.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal : Parallel Merge Join
On Fri, Feb 24, 2017 at 4:49 PM, Dilip Kumarwrote: > On Fri, Feb 24, 2017 at 3:04 PM, Amit Kapila wrote: >> What advantage do you see for considering such a path when the cost of >> innerpath is more than cheapest_total_inner? Remember the more paths >> we try to consider, the more time we spend in the planner. By any >> chance are you able to generate any query where it will give benefit >> by considering costlier innerpath? > > Changed > It seems you have forgotten to change in the below check: + if (innerpath != NULL && + innerpath->parallel_safe && + (cheapest_startup_inner == NULL || + cheapest_startup_inner->parallel_safe == false || + compare_path_costs(innerpath, cheapest_startup_inner, + STARTUP_COST) < 0)) + /* Found a cheap (or even-cheaper) sorted parallel safer path */ typo /safer/safe Note - Change the patch status in CF app (to Needs Review) whenever you post a new patch. I could see that your other patch also needs a similar update in CF app. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal : Parallel Merge Join
On Fri, Feb 24, 2017 at 3:04 PM, Amit Kapilawrote: > What advantage do you see for considering such a path when the cost of > innerpath is more than cheapest_total_inner? Remember the more paths > we try to consider, the more time we spend in the planner. By any > chance are you able to generate any query where it will give benefit > by considering costlier innerpath? Changed > > 2. > +static void generate_parallel_mergejoin_paths(PlannerInfo *root, > + RelOptInfo *joinrel, > + RelOptInfo *innerrel, > + Path *outerpath, > + JoinType jointype, > + JoinPathExtraData *extra, > + Path *inner_cheapest_total, > + List *merge_pathkeys); > > It is better to name this function as > generate_partial_mergejoin_paths() as we are generating only partial > paths in this function and accordingly change the comment on top of > the function. I see that you might be naming it based on > consider_parallel_*, however, I think it is better to use partial in > the name as that is what we are doing inside that function. Also, I > think this function has removed/changed some handling related to > unique outer and full joins, so it is better to mention that in the > function comments, something like "unlike above function, this > function doesn't expect to handle join types JOIN_UNIQUE_OUTER or > JOIN_FULL" and add Assert for both of those types. Done > > 3. > A test case is still missing. Added -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com parallel_mergejoin_v6.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal : Parallel Merge Join
On Fri, Feb 24, 2017 at 3:42 PM, Dilip Kumarwrote: > On Fri, Feb 24, 2017 at 3:04 PM, Amit Kapila wrote: > > Thanks for the comments. > >> What advantage do you see for considering such a path when the cost of >> innerpath is more than cheapest_total_inner? Remember the more paths >> we try to consider, the more time we spend in the planner. By any >> chance are you able to generate any query where it will give benefit >> by considering costlier innerpath? > > If inner_cheapest_total path is not parallel safe then I am trying to > find the cheapest parallel safe path and generate partial merge join. > Not sure, if we can just ignore the cheapest inner path because it is not parallel safe. It is also possible that you pick costly inner path just because it is parallel safe and then, later on, you have to discard it because the overall cost of partial merge join is much more. > I agree that it will consider one extra path while considering every > partial merge join path > Hmm, AFAICS, it will consider two extra paths per sort key (the loop around considering such paths is up to num_sortkeys) > (I think with this addition we will not get > parallel merge path for the any more TPCH query). I feel in general > we can get some better plan by this especially when gather is the > cheapest path for inner relation(which is not parallel safe) but at > the cost of considering some extra paths. I agree in some cases it could be better, but I think benefits are not completely clear, so probably we can leave it as of now and if later any one comes with a clear use case or can see the benefits of such path then we can include it. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal : Parallel Merge Join
On Fri, Feb 24, 2017 at 3:04 PM, Amit Kapilawrote: Thanks for the comments. > What advantage do you see for considering such a path when the cost of > innerpath is more than cheapest_total_inner? Remember the more paths > we try to consider, the more time we spend in the planner. By any > chance are you able to generate any query where it will give benefit > by considering costlier innerpath? If inner_cheapest_total path is not parallel safe then I am trying to find the cheapest parallel safe path and generate partial merge join. I agree that it will consider one extra path while considering every partial merge join path (I think with this addition we will not get parallel merge path for the any more TPCH query). I feel in general we can get some better plan by this especially when gather is the cheapest path for inner relation(which is not parallel safe) but at the cost of considering some extra paths. What is your thought on this? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal : Parallel Merge Join
On Tue, Feb 14, 2017 at 5:22 PM, Dilip Kumarwrote: > On Tue, Feb 14, 2017 at 12:25 PM, Amit Kapila wrote: > Apart from this, there was one small problem in an earlier version > which I have corrected in this. > > + /* Consider only parallel safe inner path */ > + if (innerpath != NULL && > + innerpath->parallel_safe && > + (cheapest_total_inner == NULL || > + cheapest_total_inner->parallel_safe == false || > + compare_path_costs(innerpath, cheapest_total_inner, > + TOTAL_COST) < 0)) > > In this comparison, we were only checking if innerpath is cheaper than > the cheapest_total_inner then generate path with this new inner path > as well. Now, I have added one more check if cheapest_total_inner was > not parallel safe then also consider a path with this new inner > (provided this inner is parallel safe). > What advantage do you see for considering such a path when the cost of innerpath is more than cheapest_total_inner? Remember the more paths we try to consider, the more time we spend in the planner. By any chance are you able to generate any query where it will give benefit by considering costlier innerpath? 2. +static void generate_parallel_mergejoin_paths(PlannerInfo *root, + RelOptInfo *joinrel, + RelOptInfo *innerrel, + Path *outerpath, + JoinType jointype, + JoinPathExtraData *extra, + Path *inner_cheapest_total, + List *merge_pathkeys); It is better to name this function as generate_partial_mergejoin_paths() as we are generating only partial paths in this function and accordingly change the comment on top of the function. I see that you might be naming it based on consider_parallel_*, however, I think it is better to use partial in the name as that is what we are doing inside that function. Also, I think this function has removed/changed some handling related to unique outer and full joins, so it is better to mention that in the function comments, something like "unlike above function, this function doesn't expect to handle join types JOIN_UNIQUE_OUTER or JOIN_FULL" and add Assert for both of those types. 3. A test case is still missing. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal : Parallel Merge Join
On Tue, Feb 14, 2017 at 12:25 PM, Amit Kapilawrote: > Few review comments on the latest version of the patch: > 1. > - if (joinrel->consider_parallel && nestjoinOK && > - save_jointype != JOIN_UNIQUE_OUTER && > - bms_is_empty(joinrel->lateral_relids)) > + if (outerrel->partial_pathlist == NIL || > + !joinrel->consider_parallel || > + save_jointype == JOIN_UNIQUE_OUTER || > + !bms_is_empty(joinrel->lateral_relids) || > + jointype == JOIN_FULL || > + jointype == JOIN_RIGHT) > + return; > > > For the matter of consistency, I think the above checks should be in > same order as they are in function hash_inner_and_outer(). I wonder > why are you using jointype in above check instead of save_jointype, it > seems to me we can use save_jointype for all the cases. Done > > 2. > + generate_mergejoin_paths(root, joinrel, innerrel, outerpath, > + > jointype, extra, false, > + > inner_cheapest_total, merge_pathkeys, > + > true); > > IIUC, the reason of passing a 7th parameter (useallclauses) as false > is that it can be true only for Right and Full joins and for both we > don't generate partial merge join paths. If so, then I think it is > better to add a comment about such an assumption, so that we don't > forget to update this code in future if we need to useallclauses for > something else. Another way to deal with this is that you can pass > the value of useallclauses to consider_parallel_mergejoin() and then > to generate_mergejoin_paths(). I feel second way is better, but if > for some reason you don't find it appropriate then at least add a > comment. After fixing 3rd comments this will not be needed. > > 3. > generate_mergejoin_paths() > { > > The above and similar changes in generate_mergejoin_paths() doesn't > look good and in some cases when innerpath is *parallel-unsafe*, you > need to perform some extra computation which is not required. How > about writing a separate function generate_partial_mergejoin_paths()? > I think you can save some extra computation and it will look neat. I > understand that there will be some code duplication, but not sure if > there is any other better way. Okay, I have done as suggested. Apart from this, there was one small problem in an earlier version which I have corrected in this. + /* Consider only parallel safe inner path */ + if (innerpath != NULL && + innerpath->parallel_safe && + (cheapest_total_inner == NULL || + cheapest_total_inner->parallel_safe == false || + compare_path_costs(innerpath, cheapest_total_inner, + TOTAL_COST) < 0)) In this comparison, we were only checking if innerpath is cheaper than the cheapest_total_inner then generate path with this new inner path as well. Now, I have added one more check if cheapest_total_inner was not parallel safe then also consider a path with this new inner (provided this inner is parallel safe). -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com parallel_mergejoin_v5.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal : Parallel Merge Join
On Wed, Jan 4, 2017 at 9:27 PM, Dilip Kumarwrote: > On Wed, Jan 4, 2017 at 12:02 PM, Amit Kapila wrote: >> 3. >> + /* >> + * See comments in try_partial_nestloop_path(). >> + */ >> >> This same code exists in try_partial_nestloop_path() and >> try_partial_hashjoin_path() with minor difference of code in >> try_partial_hashjoin_path(). Can't we write a separate inline >> function for this code and call from all the three places. > > It's a small check, is it ok to make it the separate function. I have > also observed similar kind of duplicate code in try_mergejoin_path and > try_hashjoin_path. However, if you think that it will be better to > move that check to an inline function I can submit a seperate patch in > this thread as a refactoring patch? > Let's keep that check unchanged. Few review comments on the latest version of the patch: 1. - if (joinrel->consider_parallel && nestjoinOK && - save_jointype != JOIN_UNIQUE_OUTER && - bms_is_empty(joinrel->lateral_relids)) + if (outerrel->partial_pathlist == NIL || + !joinrel->consider_parallel || + save_jointype == JOIN_UNIQUE_OUTER || + !bms_is_empty(joinrel->lateral_relids) || + jointype == JOIN_FULL || + jointype == JOIN_RIGHT) + return; For the matter of consistency, I think the above checks should be in same order as they are in function hash_inner_and_outer(). I wonder why are you using jointype in above check instead of save_jointype, it seems to me we can use save_jointype for all the cases. 2. + generate_mergejoin_paths(root, joinrel, innerrel, outerpath, + jointype, extra, false, + inner_cheapest_total, merge_pathkeys, + true); IIUC, the reason of passing a 7th parameter (useallclauses) as false is that it can be true only for Right and Full joins and for both we don't generate partial merge join paths. If so, then I think it is better to add a comment about such an assumption, so that we don't forget to update this code in future if we need to useallclauses for something else. Another way to deal with this is that you can pass the value of useallclauses to consider_parallel_mergejoin() and then to generate_mergejoin_paths(). I feel second way is better, but if for some reason you don't find it appropriate then at least add a comment. 3. generate_mergejoin_paths() { .. .. - try_mergejoin_path(root, - joinrel, - outerpath, - innerpath, - merge_pathkeys, - newclauses, - NIL, - NIL, - jointype, - extra); + if (!is_partial) + try_mergejoin_path(root, + joinrel, + outerpath, + innerpath, + merge_pathkeys, + newclauses, + NIL, + NIL, + jointype, + extra); + /* Generate partial path only if innerpath is parallel safe. */ + else if (innerpath->parallel_safe) + try_partial_mergejoin_path(root, + joinrel, + outerpath, + innerpath, + merge_pathkeys, + newclauses, + NIL, + NIL, + jointype, + extra); .. } The above and similar changes in generate_mergejoin_paths() doesn't look good and in some cases when innerpath is *parallel-unsafe*, you need to perform some extra computation which is not required. How about writing a separate function generate_partial_mergejoin_paths()? I think you can save some extra computation and it will look neat. I understand that there will be some code duplication, but not sure if there is any other better way. How about adding one simple test case as I have kept for other parallel patches like parallel index scan, parallel subplan, etc.? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal : Parallel Merge Join
On Thu, Jan 5, 2017 at 12:57 AM, Dilip Kumarwrote: > On Wed, Jan 4, 2017 at 12:02 PM, Amit Kapila wrote: >> Review comments: >> 1. >> + bool is_partial); >> + >> >> Seems additional new line is not required. > Fixed This patch has a patch, no new reviews. Moved to CF 2017-03. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal : Parallel Merge Join
On Wed, Jan 4, 2017 at 12:02 PM, Amit Kapilawrote: > Review comments: > 1. > + bool is_partial); > + > > Seems additional new line is not required. Fixed > > 2. > + * try_partial_mergejoin_path > + * Consider a partial merge join path; if it appears useful, > push it into > + * the joinrel's pathlist via add_path(). > + */ > > I think in above comment, you should say ".. joinrel's > partial_pathlist via add_partial_path()" Fixed > > 3. > + /* > + * See comments in try_partial_nestloop_path(). > + */ > > This same code exists in try_partial_nestloop_path() and > try_partial_hashjoin_path() with minor difference of code in > try_partial_hashjoin_path(). Can't we write a separate inline > function for this code and call from all the three places. It's a small check, is it ok to make it the separate function. I have also observed similar kind of duplicate code in try_mergejoin_path and try_hashjoin_path. However, if you think that it will be better to move that check to an inline function I can submit a seperate patch in this thread as a refactoring patch? I observed one more issue that in case of partial merge join inner_paramrels should be empty not subset of outer, so fixed the same in the attached version. The code in the previous patch will also not create any problem, because if the inner path is parameterized by outerrel it will not create any merge join path. > > 4. > + /* > + * See comments in try_nestloop_path(). > + */ > + initial_cost_mergejoin(root, > , jointype, mergeclauses, > + outer_path, > inner_path, > + outersortkeys, innersortkeys, > + > extra->sjinfo); > > I think in above comments, you want to say try_partial_nestloop_path(). Fixed > 5. > - if (joinrel->consider_parallel && nestjoinOK && > - save_jointype != JOIN_UNIQUE_OUTER && > - bms_is_empty(joinrel->lateral_relids)) > + if (!joinrel->consider_parallel || > + save_jointype == JOIN_UNIQUE_OUTER || > + !bms_is_empty(joinrel->lateral_relids) || > + jointype == JOIN_FULL || > + jointype == JOIN_RIGHT) > + return; > + > + if (nestjoinOK) > > Here, we only want to create partial paths when > outerrel->partial_pathlist is not NILL, so I think you can include > that check as well. Done. Another point to note is that in HEAD, we are not > checking for jointype as JOIN_RIGHT or JOIN_FULL for considering > parallel nestloop paths, whereas with your patch, it will do those is al> checks, is it a bug of HEAD or is there any other reason for including > those checks for parallel nestloop paths? Actually we can not create parallel join path in case of JOIN_RIGHT or JOIN_FULL. nestjoinOK is marked false in case of JOIN_RIGHT or JOIN_FULL. So it was not considered in base code as well. Now we have mergejoin as well so it's better to keep a common check. > > 6. > + /* Can't generate mergejoin path if inner rel is parameterized by outer */ > + if (inner_cheapest_total != NULL) > + { > + ListCell *lc1; > + > + /* generate merge join path for each partial outer path */ > + foreach(lc1, outerrel->partial_pathlist) > + { > + Path *outerpath = (Path *) lfirst(lc1); > + List *merge_pathkeys; > + > + /* > + * Figure out what useful ordering any paths we create > + * will have. > + */ > + merge_pathkeys = build_join_pathkeys(root, joinrel, jointype, > + outerpath->pathkeys); > + > + generate_mergejoin_paths(root, joinrel, innerrel, outerpath, > + save_jointype, extra, false, > + inner_cheapest_total, merge_pathkeys, > + true); > + } > + > + } > > Won't it be better if you encapsulate the above chunk of code in > function consider_parallel_merjejoin() similar to > consider_parallel_nestloop()? I think that way code will look > cleaner. Done > > 7. > + /* > + * Figure out what useful ordering any paths we create > + * will have. > + */ > + merge_pathkeys = build_join_pathkeys(root, joinrel, jointype, > + outerpath->pathkeys); > > indentation problem with variable outerpath->pathkeys. Fixed > > 8. > - try_mergejoin_path(root, > - joinrel, > - outerpath, > - inner_cheapest_total, > - merge_pathkeys, > - mergeclauses, > - NIL, > - innersortkeys, > - jointype, > - extra); > + if (!is_partial) > + try_mergejoin_path(root, > + joinrel, > + outerpath, > + inner_cheapest_total, > + merge_pathkeys, > + mergeclauses, > + NIL, > + innersortkeys, > + jointype, > + extra); > + > + /* Generate partial path if inner is parallel safe. */ > + else if (inner_cheapest_total->parallel_safe) > + try_partial_mergejoin_path(root, > + joinrel, > + outerpath, > + inner_cheapest_total, > + merge_pathkeys, > + mergeclauses, > + NIL, > + innersortkeys, > + jointype, > + extra); > > In above code indentation is broken, similarly, there is another code > in a patch where it is broken, try using pgindent. Fixed with pgindent. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com parallel_mergejoin_v4.patch Description: Binary data -- Sent via pgsql-hackers mailing list
Re: [HACKERS] Proposal : Parallel Merge Join
On Wed, Dec 21, 2016 at 9:23 PM, Dilip Kumarwrote: > On Wed, Dec 21, 2016 at 8:39 PM, Robert Haas wrote: >> Committed the refactoring patch with some mild cosmetic adjustments. > > Thanks.. >> >> As to the second patch: >> >> +if (jointype == JOIN_UNIQUE_INNER) >> +jointype = JOIN_INNER; >> >> Isn't this dead code? save_jointype might that value, but jointype won't. > > Yes, it is. > > I have fixed this, and also observed that comment for > try_partial_mergejoin_path header was having some problem, fixed that > too. > Review comments: 1. + bool is_partial); + Seems additional new line is not required. 2. + * try_partial_mergejoin_path + * Consider a partial merge join path; if it appears useful, push it into + * the joinrel's pathlist via add_path(). + */ I think in above comment, you should say ".. joinrel's partial_pathlist via add_partial_path()" 3. + /* + * See comments in try_partial_nestloop_path(). + */ + Assert(bms_is_empty (joinrel->lateral_relids)); + if (inner_path->param_info != NULL) + { + Relids inner_paramrels = inner_path->param_info->ppi_req_outer; + + if (!bms_is_subset (inner_paramrels, outer_path->parent->relids)) + return; + } This same code exists in try_partial_nestloop_path() and try_partial_hashjoin_path() with minor difference of code in try_partial_hashjoin_path(). Can't we write a separate inline function for this code and call from all the three places. 4. + /* + * See comments in try_nestloop_path(). + */ + initial_cost_mergejoin(root, , jointype, mergeclauses, + outer_path, inner_path, + outersortkeys, innersortkeys, + extra->sjinfo); I think in above comments, you want to say try_partial_nestloop_path(). 5. - if (joinrel->consider_parallel && nestjoinOK && - save_jointype != JOIN_UNIQUE_OUTER && - bms_is_empty(joinrel->lateral_relids)) + if (!joinrel->consider_parallel || + save_jointype == JOIN_UNIQUE_OUTER || + !bms_is_empty(joinrel->lateral_relids) || + jointype == JOIN_FULL || + jointype == JOIN_RIGHT) + return; + + if (nestjoinOK) Here, we only want to create partial paths when outerrel->partial_pathlist is not NILL, so I think you can include that check as well. Another point to note is that in HEAD, we are not checking for jointype as JOIN_RIGHT or JOIN_FULL for considering parallel nestloop paths, whereas with your patch, it will do those checks, is it a bug of HEAD or is there any other reason for including those checks for parallel nestloop paths? 6. + /* Can't generate mergejoin path if inner rel is parameterized by outer */ + if (inner_cheapest_total != NULL) + { + ListCell *lc1; + + /* generate merge join path for each partial outer path */ + foreach(lc1, outerrel->partial_pathlist) + { + Path *outerpath = (Path *) lfirst(lc1); + List *merge_pathkeys; + + /* + * Figure out what useful ordering any paths we create + * will have. + */ + merge_pathkeys = build_join_pathkeys(root, joinrel, jointype, + outerpath->pathkeys); + + generate_mergejoin_paths(root, joinrel, innerrel, outerpath, + save_jointype, extra, false, + inner_cheapest_total, merge_pathkeys, + true); + } + + } Won't it be better if you encapsulate the above chunk of code in function consider_parallel_merjejoin() similar to consider_parallel_nestloop()? I think that way code will look cleaner. 7. + /* + * Figure out what useful ordering any paths we create + * will have. + */ + merge_pathkeys = build_join_pathkeys(root, joinrel, jointype, + outerpath->pathkeys); indentation problem with variable outerpath->pathkeys. 8. - try_mergejoin_path(root, - joinrel, - outerpath, - inner_cheapest_total, - merge_pathkeys, - mergeclauses, - NIL, - innersortkeys, - jointype, - extra); + if (!is_partial) + try_mergejoin_path(root, + joinrel, + outerpath, + inner_cheapest_total, + merge_pathkeys, + mergeclauses, + NIL, + innersortkeys, + jointype, + extra); + + /* Generate partial path if inner is parallel safe. */ + else if (inner_cheapest_total->parallel_safe) + try_partial_mergejoin_path(root, + joinrel, + outerpath, + inner_cheapest_total, + merge_pathkeys, + mergeclauses, + NIL, + innersortkeys, + jointype, + extra); In above code indentation is broken, similarly, there is another code in a patch where it is broken, try using pgindent. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal : Parallel Merge Join
On Thu, Dec 29, 2016 at 3:15 AM, Tomas Vondrawrote: > FWIW, I've done quite a bit of testing on this patch, and also on the other > patches adding parallel index scans and bitmap heap scan. I've been running > TPC-H and TPC-DS on 16GB data sets with each patch, looking for regressions > or crashes. Thanks for looking into this. > > I haven't found any of that so far, which is good of course. It however > seems the plan changes only for very few queries in those benchmarks with > any of the patches, even after tweaking the costs to make parallel plans > more likely. You can also try with reducing random_page_cost (that will help parallel merge join with index scan), in case your data fits in memory and you are ensuring warm cache environment. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal : Parallel Merge Join
Hi, On 12/21/2016 04:53 PM, Dilip Kumar wrote: On Wed, Dec 21, 2016 at 8:39 PM, Robert Haaswrote: Committed the refactoring patch with some mild cosmetic adjustments. Thanks.. As to the second patch: +if (jointype == JOIN_UNIQUE_INNER) +jointype = JOIN_INNER; Isn't this dead code? save_jointype might that value, but jointype won't. Yes, it is. I have fixed this, and also observed that comment for try_partial_mergejoin_path header was having some problem, fixed that too. FWIW, I've done quite a bit of testing on this patch, and also on the other patches adding parallel index scans and bitmap heap scan. I've been running TPC-H and TPC-DS on 16GB data sets with each patch, looking for regressions or crashes. I haven't found any of that so far, which is good of course. It however seems the plan changes only for very few queries in those benchmarks with any of the patches, even after tweaking the costs to make parallel plans more likely. I'm going to try with larger scales and also --enable-cassert and post the results during CF 2017-1. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal : Parallel Merge Join
On Wed, Dec 21, 2016 at 8:39 PM, Robert Haaswrote: > Committed the refactoring patch with some mild cosmetic adjustments. Thanks.. > > As to the second patch: > > +if (jointype == JOIN_UNIQUE_INNER) > +jointype = JOIN_INNER; > > Isn't this dead code? save_jointype might that value, but jointype won't. Yes, it is. I have fixed this, and also observed that comment for try_partial_mergejoin_path header was having some problem, fixed that too. > > Apart from that and some cosmetic issues it looks basically OK to me > on a first read-through. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com parallel_mergejoin_v3.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal : Parallel Merge Join
On Tue, Dec 13, 2016 at 10:04 AM, Dilip Kumarwrote: > On Tue, Dec 13, 2016 at 6:42 AM, Dilip Kumar wrote: >>> This patch is hard to read because it is reorganizing a bunch of code >>> as well as adding new functionality. Perhaps you could separate it >>> into two patches, one with the refactoring and then the other with the >>> new functionality. >> >> Okay, I can do that. > > I have created two patches as per the suggestion. > > 1. mergejoin_refactoring_v2.patch --> Move functionality of > considering various merge join path into new function. > 2. parallel_mergejoin_v2.patch -> This adds the functionality of > supporting partial mergejoin paths. This will apply on top of > mergejoin_refactoring_v2.patch. Committed the refactoring patch with some mild cosmetic adjustments. As to the second patch: +if (jointype == JOIN_UNIQUE_INNER) +jointype = JOIN_INNER; Isn't this dead code? save_jointype might that value, but jointype won't. Apart from that and some cosmetic issues it looks basically OK to me on a first read-through. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal : Parallel Merge Join
On Tue, Dec 13, 2016 at 8:34 PM, Dilip Kumarwrote: > I have created two patches as per the suggestion. > > 1. mergejoin_refactoring_v2.patch --> Move functionality of > considering various merge join path into new function. > 2. parallel_mergejoin_v2.patch -> This adds the functionality of > supporting partial mergejoin paths. This will apply on top of > mergejoin_refactoring_v2.patch. We have done further analysis of the performance with TPCH benchmark at higher scale factor. I have tested parallel merge join patch along with parallel index scan[1] I have observed that with query3, we are getting linear scalability ('explain analyze' results are attached). Test Setup: --- TPCH 300 scale factor work_mem = 1GB shared_buffer = 1GB random_page_cost=seq_page_cost=0.1 max_parallel_workers_per_gather=4 (warm cache ensured) The median of 3 runs (reading are quite stable). On Head: 2702568.099 ms With Patch: 547363.164 ms Other Experiments: * I have also verified reading on the head, without modifying random_page_cost=seq_page_cost, but there is no change in plan or execution time. * I have tried to increase the max_parallel_workers_per_gather to 8 but I did not observe further scaling. [1] https://www.postgresql.org/message-id/CAA4eK1KA4LwTYXZG%3Dk3J2bA%2BZOEYnz2gkyWBKgY%3D_q0xJRBMDw%40mail.gmail.com -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com 3_head.out Description: Binary data 3_patch.out Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal : Parallel Merge Join
On Tue, Dec 13, 2016 at 6:42 AM, Dilip Kumarwrote: >> This patch is hard to read because it is reorganizing a bunch of code >> as well as adding new functionality. Perhaps you could separate it >> into two patches, one with the refactoring and then the other with the >> new functionality. > > Okay, I can do that. I have created two patches as per the suggestion. 1. mergejoin_refactoring_v2.patch --> Move functionality of considering various merge join path into new function. 2. parallel_mergejoin_v2.patch -> This adds the functionality of supporting partial mergejoin paths. This will apply on top of mergejoin_refactoring_v2.patch. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com mergejoin_refactoring_v2.patch Description: Binary data parallel_mergejoin_v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal : Parallel Merge Join
On Tue, Dec 13, 2016 at 12:11 AM, Robert Haaswrote: > > Hmm, so it seems my initial guess that we didn't need to bother > generating such paths was wrong. Oops. > > This patch is hard to read because it is reorganizing a bunch of code > as well as adding new functionality. Perhaps you could separate it > into two patches, one with the refactoring and then the other with the > new functionality. Okay, I can do that. > Actually, though, I don't understand why you need > so much rearrangement Actually match_unsorted_outer is quite complex function and considering merge join path for many cases, And In my opinion those all paths should be considered for partial outer as well. So one option was to duplicate that part of code. But I chose to move all that code from match_unsorted_outer (which is related to generating various merge join path) to new function called generate_mergejoin_path. And, use it for normal outer path as well as for partial outer path. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal : Parallel Merge Join
On Sat, Dec 10, 2016 at 7:59 AM, Dilip Kumarwrote: > I would like to propose a patch for parallelizing merge join path. > This idea is derived by analyzing TPCH results. > > I have done this analysis along with my colleagues Rafia sabih and Amit > kaplia. > > Currently we already have infrastructure for executing parallel join, > so we don't need any change at executor level. Only in path > generation, we need to add partial paths for merge join, like we do > for nest loop and hash join. Hmm, so it seems my initial guess that we didn't need to bother generating such paths was wrong. Oops. This patch is hard to read because it is reorganizing a bunch of code as well as adding new functionality. Perhaps you could separate it into two patches, one with the refactoring and then the other with the new functionality. Actually, though, I don't understand why you need so much rearrangement -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal : Parallel Merge Join
On Sun, Dec 11, 2016 at 8:14 AM, Peter Geogheganwrote: > I noticed that the partially parallelized Merge Join EXPLAIN ANALYZE > that you attach has a big misestimation: > > -> Merge Join (cost=3405052.45..3676948.66 rows=320 width=32) > (actual time=21165.849..37814.551 rows=1357812 loops=4) > > Is this the best test case to show off the patch? This node is the > immediate outer child of a Nested Loop Semi Join, and so I'm concerned > that we measuring the wrong thing. Actually main purpose of showing this example is that, by enabling parallelism with merge join we can enable the parallelism in complete tree, and we can get completely different kind of plan which is way cheaper. But I completely agree with your point that, in this particular example estimation is wrong and may be with correct estimation plan can be entirely different. I am planning to test this with more self generated test case where first we can guarantee that our estimation is almost correct and see the impact of the patch. Here is one such example, this time I tested my patch with Amit's parallel index scan patch [1], just to enable more scope for parallel merge join. All configurations and machines are same what I mentioned up thread. create table test1(a, b) as select generate_series(1, 1000),generate_series(1, 1000); create table test2(a, b) as select generate_series(1, 1000),generate_series(1, 1000); create index idx1 on test1(a); create index idx2 on test2(a); Query: explain analyze select * from test1, test2 where test1.a=test2.a and test1.b = test2.b and test1.a+test2.b < 3 and test1.a < 300 and test2.a < 300; On Head: Merge Join (cost=6.92..228073.90 rows=1 width=16) (actual time=0.033..3123.400 rows=1 loops=1) Merge Cond: (test1.a = test2.a) Join Filter: ((test1.b = test2.b) AND ((test1.a + test2.b) < 3)) Rows Removed by Join Filter: 298 -> Index Scan using idx1 on test1 (cost=0.43..98626.90 rows=2998198 width=8) (actual time=0.013..746.382 rows=299 loops=1) Index Cond: (a < 300) -> Index Scan using idx2 on test2 (cost=0.43..98708.62 rows=3000639 width=8) (actual time=0.012..751.558 rows=299 loops=1) Index Cond: (a < 300) Planning time: 0.604 ms Execution time: 3123.442 ms With Patch: Gather (cost=1005.46..193001.64 rows=1 width=16) (actual time=0.244..1883.087 rows=1 loops=1) Workers Planned: 3 Workers Launched: 3 -> Merge Join (cost=5.46..192000.64 rows=1 width=16) (actual time=1409.686..1880.394 rows=0 loops=4) Merge Cond: (test2.a = test1.a) Join Filter: ((test1.b = test2.b) AND ((test1.a + test2.b) < 3)) Rows Removed by Join Filter: 75 -> Parallel Index Scan using idx2 on test2 (cost=0.43..78381.71 rows=967948 width=8) (actual time=0.023..221.172 rows=75 loops=4) Index Cond: (a < 300) -> Index Scan using idx1 on test1 (cost=0.43..98626.90 rows=2998198 width=8) (actual time=0.024..893.081 rows=2999254 loops=4) Index Cond: (a < 300) Planning time: 0.281 ms Execution time: 1888.750 ms This example shows direct improvement with merge join, but IMHO the main value of this patch is that we can parallelize multi level join query, where parallelism is not used becuase of merge join at some level. [1] https://www.postgresql.org/message-id/CAA4eK1KA4LwTYXZG%3Dk3J2bA%2BZOEYnz2gkyWBKgY%3D_q0xJRBMDw%40mail.gmail.com -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal : Parallel Merge Join
On Sat, Dec 10, 2016 at 4:59 AM, Dilip Kumarwrote: > 3. 20_patch.out : Explain analyze output with patch (median of 3 runs) I noticed that the partially parallelized Merge Join EXPLAIN ANALYZE that you attach has a big misestimation: -> Merge Join (cost=3405052.45..3676948.66 rows=320 width=32) (actual time=21165.849..37814.551 rows=1357812 loops=4) Is this the best test case to show off the patch? This node is the immediate outer child of a Nested Loop Semi Join, and so I'm concerned that we measuring the wrong thing. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Proposal : Parallel Merge Join
I would like to propose a patch for parallelizing merge join path. This idea is derived by analyzing TPCH results. I have done this analysis along with my colleagues Rafia sabih and Amit kaplia. Currently we already have infrastructure for executing parallel join, so we don't need any change at executor level. Only in path generation, we need to add partial paths for merge join, like we do for nest loop and hash join. After this patch, we will get some new type of paths as shown below. -> Gather -> MergeJoin -> Sort -> Partial Seq Scan -> Any parallel safe inner path or -> Gather -> MergeJoin -> Partial Index Scan path (or any sorted partial path) -> Any parallel safe inner path Major benefit of this patch is, it can be helpful in converting some paths to parallel paths where they were not using parallelism at all, may be because we need merge join at intermediate level. Performance: -- - I applied this patch on head and tested TPCH (as of now only with scale factor 10). And, observed that Q20 is giving 2x gain. On head, this query was not using parallelism at all, but after parallel merge, it's able to use parallelism all the way up to top level join (explain analyze result is attached). - I need to test this with different scale factor and also with other patches like parallel index scan, gather merge and I hope to see more paths getting benefited. * 'gather merge' will be useful where we expect sorted path from merge join, because parallel merge join will not give sorted result. Test configuration and machine detail: -- TPCH: scale factor : 10 work_mem : 100MB shared buffer : 1GB max_parallel_workers_per_gather : 3 Machine : POWER 4 socket machine. Attached file: -- 1. parallel_mergejooin_v1.patch : parallel merge join patch 2. 20_head.out : Explain analyze output on head (median of 3 runs) 3. 20_patch.out : Explain analyze output with patch (median of 3 runs) -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com 20_head.out Description: Binary data 20_patch.out Description: Binary data parallel_mergejoin_v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers