Re: Aggregate node doesn't include cost for sorting

2022-12-08 Thread David Rowley
On Fri, 9 Dec 2022 at 03:38, Tom Lane  wrote:
> It's true that the cost attributed to the Agg node won't impact any
> live decisions in the plan level in which it appears.  However, if
> that's a subquery, then the total cost attributed to the subplan
> could in principle affect plan choices in the outer query.  So there
> is a valid argument for wanting to try to get it right.

I guess the jit thresholds are another reason to try to make the costs
a reflection of the expected run-time too.

> Having said that, I think that the actual impact on outer-level choices
> is usually minimal.  So it didn't bother me that we swept this under
> the rug before --- and I'm pretty sure that we're sweeping similar
> things under the rug elsewhere in top-of-query planning.  However,
> given 1349d279 it should surely be true that the planner knows how many
> sorts it's left to be done at runtime (a number we did not have at hand
> before).  So it seems like it ought to be simple enough to account for
> this effect more accurately.  I'd be in favor of doing so if it's
> simple and cheap to get the numbers.

Ok, probably Heikki's work in 0a2bc5d61 is a more useful piece of work
to get us closer to that goal.  I think all that's required to make it
work is adding on the costs in the final foreach loop in
get_agg_clause_costs().  The Aggrefs have already been marked as
aggpresorted by that time, so it should be a matter of:

if ((aggref->aggorder != NIL || aggref->aggdistinct != NIL) &&
!aggref->aggpresorted)
 // add costs for sort

David




Re: Aggregate node doesn't include cost for sorting

2022-12-08 Thread David Rowley
On Fri, 9 Dec 2022 at 01:12, David Geier  wrote:
> Both plans were captured on 14.5, which is indeed prior to 1349d279.
>
> I disabled sequential scan to show that there's an alternative plan
> which is superior to the chosen plan: Index Only Scan is more expensive
> and takes longer than the Seq Scan, but the subsequent Aggregate runs
> much faster as it doesn't have to sort, making the plan overall superior.

Aha, 14.5. What's going on there is that it's still doing the sort.
The aggregate code in that version does not skip the sort because of
the presorted input. A likely explanation for the performance increase
is due to the presorted check in our qsort implementation. The
successful presort check is O(N), whereas an actual sort is O(N *
logN).

It's true that if we had been doing proper costing on these ORDER BY /
DISTINCT aggregates that we could have noticed that the input path's
pathkeys indicate that no sort is required and costed accordingly, but
if we'd gone to the trouble of factoring that into the costs, then it
would also have made sense to make nodeAgg.c not sort on presorted
input.  We got the latter in 1349d279. It's just we didn't do anything
about the costings in that commit.

Anyway, in the next version of Postgres, the planner is highly likely
to choose the 2nd plan in your original email. It'll also be even
faster than you've shown due to the aggregate code not having to store
and read tuples in the tuplesort object. Also, no O(N) presort check
either.  The performance should be much closer to what it would be if
you disabled seqscan and dropped the DISTINCT out of your aggregate.

David




Re: Aggregate node doesn't include cost for sorting

2022-12-08 Thread Tom Lane
David Rowley  writes:
> So, with the assumption that you've used 2 different versions to show
> this output, for post 1349d279, there does not seem to be much choice
> on how the aggregate is executed. What's your concern about the
> costings having to be accurate given there's no other plan choice?

It's true that the cost attributed to the Agg node won't impact any
live decisions in the plan level in which it appears.  However, if
that's a subquery, then the total cost attributed to the subplan
could in principle affect plan choices in the outer query.  So there
is a valid argument for wanting to try to get it right.

Having said that, I think that the actual impact on outer-level choices
is usually minimal.  So it didn't bother me that we swept this under
the rug before --- and I'm pretty sure that we're sweeping similar
things under the rug elsewhere in top-of-query planning.  However,
given 1349d279 it should surely be true that the planner knows how many
sorts it's left to be done at runtime (a number we did not have at hand
before).  So it seems like it ought to be simple enough to account for
this effect more accurately.  I'd be in favor of doing so if it's
simple and cheap to get the numbers.

regards, tom lane




Re: Aggregate node doesn't include cost for sorting

2022-12-08 Thread David Geier

Hi David,

Thanks for the explanations and awesome that this was improved on already.
I didn't have this change on my radar.

On 12/8/22 11:40, David Rowley wrote:


This output surely must be from a version of PostgreSQL prior to
1349d279?  I can't quite figure out why you've added a "SET
enable_seqscan = FALSE;". That makes it look like you've used the same
version of PostgreSQL to produce both of these plans.  The 2nd plan
you've shown must be post 1349d279.



Both plans were captured on 14.5, which is indeed prior to 1349d279.

I disabled sequential scan to show that there's an alternative plan 
which is superior to the chosen plan: Index Only Scan is more expensive 
and takes longer than the Seq Scan, but the subsequent Aggregate runs 
much faster as it doesn't have to sort, making the plan overall superior.




So, with the assumption that you've used 2 different versions to show
this output, for post 1349d279, there does not seem to be much choice
on how the aggregate is executed. What's your concern about the
costings having to be accurate given there's no other plan choice?


There's another plan choice which is using the index to get pre-sorted 
input rows, see previous comment.




The new pre-sorted aggregate code will always request the sort order
that suits the largest number of ORDER BY / DISTINCT aggregates.  When
there are multiple ORDER BY / DISTINCT aggregates and they have
different sort requirements then there certainly are completing ways
that the aggregation portion of the plan can be executed.  I opted to
make the choice just based on the number of aggregates that could
become presorted.  nodeAgg.c is currently not very smart about sharing
sorts between multiple aggregates with the same sort requirements.  If
that was made better, there might be more motivation to have better
costing code in make_pathkeys_for_groupagg().  However, the reasons
for the reversion of db0d67db seemed to be mainly around the lack of
ability to accurately cost multiple competing sort orders. We'd need
to come up with some better way to do that if we were to want to give
make_pathkeys_for_groupagg() similar abilities.


Also, why does the Aggregate node sort itself? Why don't we instead emit
an explicit Sort node in the plan so that it's visible to the user what
is happening? As soon as there's also a GROUP BY in the query, a Sort
node occurs in the plan. This seems inconsistent.

Post 1349d279 we do that, but it can only do it for 1 sort order.
There can be any number of aggregate functions which require a sort
and they don't all have to have the same sort order requirements.  We
can't do the same as WindowAgg does by chaining nodes together either
because the aggregate node aggregates the results and we'd need all
the same input rows to be available at each step.

The only other way would be to have it so an Aggregate node could be
fed by multiple different input nodes and then it would only work on
the aggregates that suit the given input before reading the next input
and doing the other aggregates.  Making it work like that would cause
quite a bit of additional effort during planning (not to mention the
executor). We'd have to run the join search once per required order,
which is one of the slowest parts of planning.  Right now, you could
probably make that work by just writing the SQL to have a subquery per
sort requirement.


Thanks for the explanation!

--
David Geier
(ServiceNow)





Re: Aggregate node doesn't include cost for sorting

2022-12-08 Thread David Rowley
On Thu, 8 Dec 2022 at 22:06, David Geier  wrote:
> The cost of an Aggregate node seems to be the same regardless of the
> input being pre-sorted or not. If however the input is not sorted, the
> Aggregate node must additionally perform a sort which can impact runtime
> significantly. Here is an example:
>
> -- Unsorted input. Aggregate node must additionally sort all rows.
> EXPLAIN ANALYZE SELECT COUNT(DISTINCT(col1)) FROM foo;
>
>  QUERY PLAN
> ---
>   Aggregate  (cost=2584.00..2584.01 rows=1 width=8) (actual
> time=1684.705..1684.809 rows=1 loops=1)
> ->  Seq Scan on foo  (cost=0.00..2334.00 rows=10 width=71)

This output surely must be from a version of PostgreSQL prior to
1349d279?  I can't quite figure out why you've added a "SET
enable_seqscan = FALSE;". That makes it look like you've used the same
version of PostgreSQL to produce both of these plans.  The 2nd plan
you've shown must be post 1349d279.

So, with the assumption that you've used 2 different versions to show
this output, for post 1349d279, there does not seem to be much choice
on how the aggregate is executed. What's your concern about the
costings having to be accurate given there's no other plan choice?

The new pre-sorted aggregate code will always request the sort order
that suits the largest number of ORDER BY / DISTINCT aggregates.  When
there are multiple ORDER BY / DISTINCT aggregates and they have
different sort requirements then there certainly are completing ways
that the aggregation portion of the plan can be executed.  I opted to
make the choice just based on the number of aggregates that could
become presorted.  nodeAgg.c is currently not very smart about sharing
sorts between multiple aggregates with the same sort requirements.  If
that was made better, there might be more motivation to have better
costing code in make_pathkeys_for_groupagg().  However, the reasons
for the reversion of db0d67db seemed to be mainly around the lack of
ability to accurately cost multiple competing sort orders. We'd need
to come up with some better way to do that if we were to want to give
make_pathkeys_for_groupagg() similar abilities.

> Also, why does the Aggregate node sort itself? Why don't we instead emit
> an explicit Sort node in the plan so that it's visible to the user what
> is happening? As soon as there's also a GROUP BY in the query, a Sort
> node occurs in the plan. This seems inconsistent.

Post 1349d279 we do that, but it can only do it for 1 sort order.
There can be any number of aggregate functions which require a sort
and they don't all have to have the same sort order requirements.  We
can't do the same as WindowAgg does by chaining nodes together either
because the aggregate node aggregates the results and we'd need all
the same input rows to be available at each step.

The only other way would be to have it so an Aggregate node could be
fed by multiple different input nodes and then it would only work on
the aggregates that suit the given input before reading the next input
and doing the other aggregates.  Making it work like that would cause
quite a bit of additional effort during planning (not to mention the
executor). We'd have to run the join search once per required order,
which is one of the slowest parts of planning.  Right now, you could
probably make that work by just writing the SQL to have a subquery per
sort requirement.

David




Aggregate node doesn't include cost for sorting

2022-12-08 Thread David Geier

Hi hackers,

The cost of an Aggregate node seems to be the same regardless of the 
input being pre-sorted or not. If however the input is not sorted, the 
Aggregate node must additionally perform a sort which can impact runtime 
significantly. Here is an example:


CREATE TABLE foo(col0 INT, col1 TEXT);
INSERT INTO foo SELECT generate_series(1, 10), 
'aa' || md5(random()::TEXT);

CREATE INDEX foo_idx ON foo(col1);
SET max_parallel_workers_per_gather = 0;
SET enable_bitmapscan = FALSE;

-- Unsorted input. Aggregate node must additionally sort all rows.
EXPLAIN ANALYZE SELECT COUNT(DISTINCT(col1)) FROM foo;

    QUERY PLAN
---
 Aggregate  (cost=2584.00..2584.01 rows=1 width=8) (actual 
time=1684.705..1684.809 rows=1 loops=1)
   ->  Seq Scan on foo  (cost=0.00..2334.00 rows=10 width=71) 
(actual time=0.018..343.280 rows=10 loops=1)

 Planning Time: 0.317 ms
 Execution Time: 1685.543 ms


-- Pre-sorted input. Aggregate node doesn't have to sort all rows.
SET enable_seqscan = FALSE;
EXPLAIN ANALYZE SELECT COUNT(DISTINCT(col1)) FROM foo;

QUERY PLAN
---
 Aggregate  (cost=6756.42..6756.43 rows=1 width=8) (actual 
time=819.028..819.041 rows=1 loops=1)
   ->  Index Only Scan using foo_idx on foo (cost=6506.42..6506.42 
rows=10 width=71) (actual time=0.046..404.260 rows=10 loops=1)

 Heap Fetches: 10
 Heap Prefetches: 1334
 Planning Time: 0.438 ms
 Execution Time: 819.515 ms

The cost of the Aggregate node is in both cases the same (250.0) while 
its runtime is 1.3s in the unsorted case and 0.4s in the pre-sorted case.


Also, why does the Aggregate node sort itself? Why don't we instead emit 
an explicit Sort node in the plan so that it's visible to the user what 
is happening? As soon as there's also a GROUP BY in the query, a Sort 
node occurs in the plan. This seems inconsistent.


--
David Geier
(ServiceNow)