On Wed, Jan 11, 2017 at 1:24 PM, Robert Haas <robertmh...@gmail.com> wrote: >> Well, it's not *that* consistent. If we were estimating all the numbers >> underneath the Gather as being per-worker numbers, that would make some >> amount of sense. But neither the other seqscan, nor the hash on it, nor >> the hashjoin's output count are scaled that way. It's very hard to call >> the above display anything but flat-out broken. > > While investigating why Rushabh Lathia's Gather Merge patch sometimes > fails to pick a Gather Merge plan even when it really ought to do so, > I ran smack into this problem. I discovered that this is more than a > cosmetic issue. The costing itself is actually badly broken. > > The reason why this is happening is that final_cost_nestloop(), > final_cost_hashjoin(), and final_cost_mergejoin() don't care a whit > about whether the path they are generating is partial. They apply the > row estimate for the joinrel itself to every such path generated for > the join, except for parameterized paths which are a special case. I > think this generally has the effect of discouraging parallel joins, > because the inflated row count also inflates the join cost. I think > the right thing to do is probably to scale the row count estimate for > the joinrel's partial paths by the leader_contribution value computed > in cost_seqscan. > > Despite my general hatred of back-patching things that cause plan > changes, I'm inclined to think the fix for this should be back-patched > to 9.6, because this is really a brown-paper-bag bug. If the > consensus is otherwise I will of course defer to that consensus.
And here is a patch which seems to fix the problem. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
parallel-join-rows-v1.patch
Description: application/download
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers