Re: POC: GROUP BY optimization

Tom Lane Fri, 26 Jan 2024 07:38:40 -0800

Robert Haas <[email protected]> writes:
> On Tue, Dec 26, 2023 at 10:23 PM Tom Lane <[email protected]> wrote:
>> I think it's a fool's errand to even try to separate different sort
>> column orderings by cost.  We simply do not have sufficiently accurate
>> cost information.  The previous patch in this thread got reverted because
>> of that (well, also some implementation issues, but mostly that), and
>> nothing has happened to make me think that another try will fare any
>> better.


> I'm late to the party, but I'd like to better understand what's being
> argued here.

What I am saying is that we don't have sufficiently accurate cost
information to support the sort of logic that got committed and
reverted before.  I did not mean to imply that it's not possible
to have such info, only that it is not present today.  IOW, what
I'm saying is that if you want to write code that tries to make
a cost-based preference of one sorting over another, you *first*
need to put in a bunch of legwork to create more accurate cost
numbers.  Trying to make such logic depend on the numbers we have
today is just going to result in garbage in, garbage out.

Sadly, that's not a small task:

* We'd need to put effort into assigning more realistic procost
values --- preferably across the board, not just comparison functions.
As long as all the comparison functions have procost 1.0, you're
just flying blind.

* As you mentioned, there'd need to be some accounting for the
likely size of varlena inputs, and especially whether they might
be toasted.

* cost_sort knows nothing of the low-level sort algorithm improvements
we've made in recent years, such as abbreviated keys.

That's a lot of work, and I think it has to be done before we try
to build infrastructure on top, not afterwards.

                        regards, tom lane

Re: POC: GROUP BY optimization

Reply via email to