On Tue, 10 Jan 2023 at 06:15, Ankit Kumar Pandey <itsanki...@gmail.com> wrote: > Do we have any pending items for this patch now?
I'm just wondering if not trying this when the query has a DISTINCT clause is a copout. What I wanted to avoid was doing additional sorting work for WindowAgg just to have it destroyed by Hash Aggregate. I'm now wondering if adding both the original slightly-less-sorted path plus the new slightly-more-sorted path then if distinct decides to Hash Aggregate then it'll still be able to pick the cheapest input path to do that on. Unfortunately, our sort costing just does not seem to be advanced enough to know that sorting by fewer columns might be cheaper, so adding the additional path is likely just going to result in add_path() ditching the old slightly-less-sorted path due to the new slightly-more-sorted path having better pathkeys. So, we'd probably be wasting our time if we added both paths with the current sort costing code. # explain analyze select * from pg_Class order by relkind,relname; QUERY PLAN --------------------------------------------------------------------------------------------------------------- Sort (cost=36.01..37.04 rows=412 width=273) (actual time=0.544..0.567 rows=412 loops=1) Sort Key: relkind, relname Sort Method: quicksort Memory: 109kB -> Seq Scan on pg_class (cost=0.00..18.12 rows=412 width=273) (actual time=0.014..0.083 rows=412 loops=1) Planning Time: 0.152 ms Execution Time: 0.629 ms (6 rows) # explain analyze select * from pg_Class order by relkind; QUERY PLAN --------------------------------------------------------------------------------------------------------------- Sort (cost=36.01..37.04 rows=412 width=273) (actual time=0.194..0.218 rows=412 loops=1) Sort Key: relkind Sort Method: quicksort Memory: 109kB -> Seq Scan on pg_class (cost=0.00..18.12 rows=412 width=273) (actual time=0.014..0.083 rows=412 loops=1) Planning Time: 0.143 ms Execution Time: 0.278 ms (6 rows) the total cost is the same for both of these, but the execution time seems to vary quite a bit. Maybe we should try and do this for DISTINCT queries if the distinct_pathkeys match the orderby_pathkeys. That seems a little less copout-ish. If the ORDER BY is the same as the DISTINCT then it seems likely that the ORDER BY might opt to use the Unique path for DISTINCT since it'll already have the correct pathkeys. However, if the ORDER BY has fewer columns then it might be cheaper to Hash Aggregate and then sort all over again, especially so when the DISTINCT removes a large proportion of the rows. Ideally, our sort costing would just be better, but I think that raises the bar a little too high to start thinking of making improvements to that for this patch. David