Thank you Ashutosh. On Fri, Jan 15, 2021 at 7:18 PM Ashutosh Bapat <ashutosh.bapat....@gmail.com> wrote:
> On Thu, Jan 14, 2021 at 7:12 PM Andy Fan <zhihui.fan1...@gmail.com> wrote: > > > > Currently the cost_sort doesn't consider the number of columns to sort, > which > > means the cost of SELECT * FROM t ORDER BY a; equals with the SELECT * > > FROM t ORDER BY a, b; which is obviously wrong. The impact of this is > when we > > choose the plan for SELECT DISTINCT * FROM t ORDER BY c between: > > > > Sort > > Sort Key: c > > -> HashAggregate > > Group Key: c, a, b, d, e, f, g, h, i, j, k, l, m, n > > > > and > > > > Unique > > -> Sort > > Sort Key: c, a, b, d, e, f, g, h, i, j, k, l, m, n > > > > > > Since "Sort (c)" has the same cost as "Sort (c, a, b, d, e, f, g, h, i, > j, k, > > l, m, n)", and Unique node on a sorted input is usually cheaper than > > HashAggregate, so the later one will win usually which might bad at many > > places. > > I can imagine that HashAggregate + Sort will perform better if there > are very few distinct rows but otherwise, Unique on top of Sort would > be a better strategy since it doesn't need two operations. > > Thanks for the hint, I will consider the distinct rows as a factor in the next patch. > > > > Optimizer chose HashAggregate with my patch, but it takes 6s. after set > > enable_hashagg = off, it takes 2s. > > This example actually shows that using Unique is better than > HashAggregate + Sort. May be you want to try with some data which has > very few distinct rows. > > -- Best Regards Andy Fan (https://www.aliyun.com/)