On 10/01/23 10:53, David Rowley wrote:
On Tue, 10 Jan 2023 at 06:15, Ankit Kumar Pandey <itsanki...@gmail.com> wrote: > Do we have any pending items for this patch now? I'm just wondering if not trying this when the query has a DISTINCT clause is a copout. What I wanted to avoid was doing additional sorting work for WindowAgg just to have it destroyed by Hash Aggregate. I'm now wondering if adding both the original slightly-less-sorted path plus the new slightly-more-sorted path then if distinct decides to Hash Aggregate then it'll still be able to pick the cheapest input path to do that on. Unfortunately, our sort costing just does not seem to be advanced enough to know that sorting by fewer columns might be cheaper, so adding the additional path is likely just going to result in add_path() ditching the old slightly-less-sorted path due to the new slightly-more-sorted path having better pathkeys. So, we'd probably be wasting our time if we added both paths with the current sort costing code.
Maybe we should try and do this for DISTINCT queries if the distinct_pathkeys match the orderby_pathkeys. That seems a little less copout-ish. If the ORDER BY is the same as the DISTINCT then it seems likely that the ORDER BY might opt to use the Unique path for DISTINCT since it'll already have the correct pathkeys. However, if the ORDER BY has fewer columns then it might be cheaper to Hash Aggregate and then sort all over again, especially so when the DISTINCT removes a large proportion of the rows. Ideally, our sort costing would just be better, but I think that raises the bar a little too high to start thinking of making improvements to that for this patch.
Let me take a stab at this. Depending on complexity, we can take a call to address this in current patch or a follow up. -- Regards, Ankit Kumar Pandey