On Fri, 20 Sept 2024 at 20:48, Richard Guo <guofengli...@gmail.com> wrote: > I agree with you that sometimes the definition of 'regression' can > depend on when the alternative plans are introduced. Imagine if we > initially did not have the 1.5x pessimism factor and introduced it > later, it would be treated as a 'regression' because some queries that > could benefit from incremental sort would then have to resort to full > sort.
I think this is a good way of looking at it. I think it's a bad idea when introducing new planner/executor abilities to penalise the costs for that ability. It might make the committer sleep better at night after committing some new feature, but it's just not a future-proof endeavour. We should aim for our cost models to be as close to the truth as possible. As soon as you start introducing "just in case" penalties, you're telling lies. Lies don't work out well, especially when compounded with other lies, which is exactly what you get when layering Paths atop of Paths with just-in-case penalties added at each level. I was in this position with Memoize. The problem there is that when we don't have any stats, we assume the n_distinct is 200. Memoize can look quite attractive with such a small n_distinct estimate. I invented SELFLAG_USED_DEFAULT to allow Memoize to steer clear when the n_distinct estimate used the hard-coded default. I think that's an ok thing to do as otherwise it could work out very badly if Nested Loop -> Memoize was used instead of, say a Hash Join on a join problem with millions of rows, all of them with distinct join values. > So here is the v2 patch, which is almost the same as v1, but includes > changes in test cases and comments, and also includes a commit > message. I'll put it in the commitfest. Just looking at the commit message: > The rationale is based on the assumption that incremental sort is > always faster than full sort when there are presorted keys, a premise > that has been applied in various parts of the code. This assumption > does not always hold, particularly in cases with a large skew in the > number of rows within the presorted groups. My understanding is that the worst case for incremental sort is the same as sort, i.e. only 1 presorted group, which is the same effort to sort. Is there something I'm missing? David