Hi, On 04/30/2016 07:35 PM, Tom Lane wrote:
Julien Rouhaud <julien.rouh...@dalibo.com> writes:On 29/04/2016 18:05, Tom Lane wrote:Julien Rouhaud <julien.rouh...@dalibo.com> writes:The segfault is caused by quals_match_foreign_key() calling get_leftop() and get_rightop() on a ScalarArrayOpExpr node.I'm not sure that assuming this compatibility is the right way to fix this though.It certainly isn't.Agreed. New attached patch handles explicitly each node tag.No, this is completely nuts. The problem is that quals_match_foreign_key is simply assuming that every clause in the list it's given is an OpExpr, which is quite wrong. It should just ignore non-OpExpr quals, since it cannot do anything useful with them anyway. There's a comment claiming that non-OpExpr quals were already rejected: * Here since 'usefulquals' only contains bitmap indexes for quals * of type "var op var" we can safely skip checking this. but that doesn't appear to have anything to do with current reality.
Yes, I agree - there should be a check that the node is OpExpr in quals_match_foreign_key. This is clearly a bug :-(
While this in itself is about a two-line fix, after reviewing 137805f89acb3611 I'm pretty unhappy that it got committed at all. I think this obvious bug is a good sign that it wasn't ready. Other unfinished aspects like invention of an undocumented GUC don't leave a good impression either.
The GUC is meant to make testing during development easier. I haven't realized it got committed, but I assume Simon plans to remove it before the final release. Can't check right now with Simon, though, as he's is out of office this week.
In any case, I do agree the GUC should have been documented better, and we should probably put it on the TODO so that it gets removed before the actual release.
Moreover, it looks to me like this will add quite a lot of overhead, probably far more than is justified, because clauselist_join_selectivity is basically O(N^2) in the relation-footprint of the current join --- and not with a real small multiplier, either, as the functions it calls contain about four levels of nested loops themselves. Maybe that's unmeasurable on trivial test cases but it's going to be disastrous in large queries, or for relations with large numbers of foreign keys.
>That depends on a lot of factors, I guess. Attached are two SQL scripts that create 5 tables (f1, f2, f3, f4, f5) and then create N foreign keys on each of them. The foreign keys reference other tables, which means the loops won't terminate early (after finding the first match) when joining the first 5 tables, which makes this the worst case. And then we join different numbers of tables (2, 3, 4 or 5) and do explain analyze to measure planning time.
The first script (fk-test.sql) does joins on 2 columns, fk-test-6.sql does joins on 6 columns (so that we also know how the number of columns
affects the planning time).Sum of planning time for 100 queries (in milliseconds) with 2 columns, where "2/on" means "join on 2 tables with enable_fk_estimates=on":
N 2/on 2/off 3/on 3/off 4/on 4/off 5/on 5/off 10 6 4 9 8 22 20 77 68 100 15 11 26 19 64 36 177 93 1000 97 84 217 133 516 233 1246 342 and comparison of the timings: 2 3 4 5 10 155% 115% 114% 113% 100 142% 138% 177% 190% 1000 116% 163% 221% 364% And with the 6 columns: 2/on 2/off 3/on 3/off 4/on 4/off 5/on 5/off 10 25 7 23 20 96 82 344 297 100 21 14 65 33 233 104 735 328 1000 151 99 492 153 1603 320 4627 593 and comparison: 2 3 4 5 10 371% 120% 117% 116% 100 149% 196% 224% 224% 1000 152% 322% 502% 780% So yes, the overhead is clearly measurable, no doubt about that.
I think this should be reverted and pushed out to 9.7 development. It needs a significant amount of rewriting to fix the performance issue, and now's not the time to be doing that.
There are probably a few reasonably simple things we could do - e.g. ignore foreign keys with just a single column, as the primary goal of the patch is improving estimates with multi-column foreign keys. I believe that covers a vast majority of foreign keys in the wild.
If that's deemed insufficient, we'll have to resort to a more complex improvement, perhaps something akin to the cache proposed in in the unijoin patch. But if that's required, that's 9.7 material.
regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
-- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers