>>Nate Allan <nal...@ancestry.com> writes:
>> It seems to me that the join condition (and hence the restriction) should be 
>> pushed down into both sides of the union to bring the cardinality limit from 
>> millions to 1.  I'm imagining a rewrite like this:  
>>      R(a) J (b U c)  ->  (b J R(a)) U (c J R(a)) ...where R = Restrict, J 
>> = Join, U = Union

>[ eyes that suspiciously ... ]  I'm not convinced that such a transformation 
>is either correct in general (you seem to be assuming at least that A's join 
>column is unique, and >what is the UNION operator supposed to do with A's 
>other columns?) or likely to lead to a performance improvement in general.

If there are more columns, you are correct that you might have to project off 
any additional columns within the union, and leave the join outside of the 
union intact to bring in the extra columns.  Those are essentially the same 
considerations as when making other rewrites though.  As for this optimization 
making unions faster in general, I would argue that it is rather easy to 
produce a plan superior to complete materialization of the union.

>We possibly could push down a join condition on the inner side of a nestloop, 
>similarly to what's done in the UNION ALL case ... but that would require a 
>complete >refactoring of what the planner does with UNIONs.  By and large, 
>very little optimization effort has been put into non-ALL UNION (or INTERSECT 
>or EXCEPT).  You should >not expect that to change on a time scale of less 
>than years.

I hate to come across as contrary, but I'm pretty shocked by this answer for a 
couple reasons:
1) This is a clear-cut case of an untenable execution plan, essentially a bug 
in the planner.  This response contradicts the widely broadcast assertion that 
the PG community fixes planner bugs quickly and will not introduce hints 
because they would rather address these kinds of issues "correctly".
2) Why would more effort go into Union All rather than Union?  Are people using 
Union All more than Union, and if so is this because they actually want 
duplicates or is it because they've been trained to due to the performance 
problems with Union?  Union All, in many people's opinions, shouldn't even 
exist in a true relational sense.

Again, sorry if I'm coming off as abrasive, I've spent political capital 
pushing to get PG in on this project, and now I'm a little worried about 
whether it is going to work for this kind of scale and complexity, so I'm a 
little stressed.  I do appreciate your responses.

Best,

-Nate



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Reply via email to