On Wed, Nov 20, 2019 at 11:18 AM Tomas Vondra <[email protected]> wrote:
> On Wed, Nov 20, 2019 at 12:36:50PM -0500, Tom Lane wrote: > >Tomas Vondra <[email protected]> writes: > >> On Wed, Nov 20, 2019 at 11:12:56AM -0500, Tom Lane wrote: > >>> I'm content to say that the application should have written the query > >>> with a GROUP BY to begin with. > > > >> I'm not sure I agree with that. The problem is this really depends on > >> the number of rows that will need the subquery result (i.e. based on > >> selectivity of conditions in the outer query). For small number of rows > >> it's fine to execute the subplan repeatedly, for large number of rows > >> it's better to rewrite it to the GROUP BY form. It's hard to make those > >> judgements in the application, I think. > > > >Hm. That actually raises the stakes a great deal, because if that's > >what you're expecting, it would require planning out both the transformed > >and untransformed versions of the query before you could make a cost > >comparison. That's a *lot* harder to do in the context of our > >optimizer's structure, and it also means that the feature would consume > >even more planner cycles, than what I was envisioning (namely, a fixed > >jointree-prep-stage transformation similar to subquery pullup). > > > >I have no idea whether Greenplum really does it like that. > > > > True. I'm not really sure how exactly would the planning logic work or > how Greenplum does it. It might be the case that based on the use cases > they target they simply assume the rewritten query is the right one in > 99% of the cases, so they do the transformation always. Not sure. > > The Greenplum page mentions they also added "join-aggregates reordering", in addition to subquery unnesting. Costing pushing joins below aggregates could probably help. It does increase plan search space quite a bit. Regards, Xun
