Re: POC: GROUP BY optimization

Tomas Vondra Tue, 27 Oct 2020 13:20:25 -0700

On Mon, Oct 26, 2020 at 11:40:40AM +0100, Dmitry Dolgov wrote:

On Mon, Oct 26, 2020 at 01:28:59PM +0400, Pavel Borisov wrote:
> Thanks for your interest! FYI there is a new thread about this topic [1]
> with the next version of the patch and more commentaries (I've created
> it for visibility purposes, but probably it also created some confusion,
> sorry for that).
>
> Thanks!


I made a very quick look at your updates and noticed that it is intended to
be simple and some parts of the code are removed as they have little test
coverage. I'd propose vice versa to increase test coverage to enjoy more
precise cost calculation and probably partial grouping.

Or maybe it's worth to benchmark both patches and then re-decide what we
want more to have a more complicated or a simpler version.

Good to know that this feature is not stuck anymore and we have more than
one proposal.
Thanks!


Just to clarify, the patch that I've posted in another thread mentioned
above is not an alternative proposal, but a development of the same
patch I had posted in this thread. As mentioned in [1], reduce of
functionality is an attempt to reduce the scope, and as soon as the base
functionality looks good enough it will be returned back.


I find it hard to follow two similar threads trying to do the same (or
very similar) things in different ways. Is there any chance to join
forces and produce a single patch series merging the changes? With the
"basic" functionality at the beginning, then patches with the more
complex stuff. That's the usual way, I think.

As I said in my response on the other thread [1], I think constructing
additional paths with alternative orderings of pathkeys is the right
approach. Otherwise we can't really deal with optimizations above the
place where we consider this optimization.

That's essentially what I was trying in explain May 16 response [2]
when I actually said this:

   So I don't think there will be a single "interesting" grouping
   pathkeys (i.e. root->group_pathkeys), but a collection of pathkeys.
   And we'll need to build grouping paths for all of those, and leave
   the planner to eventually pick the one giving us the cheapest plan.

I wouldn't go as far as saying the approach in this patch (i.e. picking
one particular ordering) is doomed, but it's going to be very hard to
make it work reliably. Even if we get the costing *at this node* right,
who knows how it'll affect costing of the nodes above us?

So if I can suggest something, I'd merge the two patches, adopting the
path-based approach. With the very basic functionality/costing in the
first patch, and the more advanced stuff in additional patches.

Does that make sense?


regards


[1] 
https://www.postgresql.org/message-id/20200901210743.lutgvnfzntvhuban%40development

[2] 
https://www.postgresql.org/message-id/20200516145609.vm7nrqy7frj4ha6r%40development

--
Tomas Vondra                  http://www.2ndQuadrant.com

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: POC: GROUP BY optimization

Reply via email to