On Wed, Mar 5, 2025 at 4:43 AM Alexander Korotkov <aekorot...@gmail.com> wrote: > > On Mon, Mar 3, 2025 at 10:24 AM Andrei Lepikhov <lepi...@gmail.com> wrote: > > On 17/2/2025 01:34, Alexander Korotkov wrote: > > > Hi, Andrei! > > > > > > On Tue, Oct 8, 2024 at 8:00 AM Andrei Lepikhov <lepi...@gmail.com> wrote: > > > Thank you for your work on this subject. I agree with the general > > > direction. While everyone has used conservative estimates for a long > > > time, it's better to change them only when we're sure about it. > > > However, I'm still not sure I get the conservatism. > > > > > > if (innerbucketsize > thisbucketsize) > > > innerbucketsize = thisbucketsize; > > > if (innermcvfreq > thismcvfreq) > > > innermcvfreq = thismcvfreq; > > > > > > IFAICS, even in the worst case (all columns are totally correlated), > > > the overall bucket size should be the smallest bucket size among > > > clauses (not the largest). And the same is true of MCV. As a mental > > > experiment, we can add a new clause to hash join, which is always true > > > because columns on both sides have the same value. In fact, it would > > > have almost no influence except for the cost of extracting additional > > > columns and the cost of executing additional operators. But in the > > > current model, this additional clause would completely ruin > > > thisbucketsize and thismcvfreq, making hash join extremely > > > unappealing. Should we still revise this to calculate minimum instead > > > of maximum? > > I agree with your point. But I think the code works precisely the way > > you have described. > > You're right. I just messed up with the sides of comparison operator.
I've revised commit message, comments, formatting etc. I'm going to push this if no objections. ------ Regards, Alexander Korotkov Supabase
v3-0001-Use-extended-stats-for-precise-estimation-of-buck.patch
Description: Binary data