On Wed, 2 Dec 2020 at 15:51, Dean Rasheed <dean.a.rash...@gmail.com> wrote: > > The sort of queries I had in mind were things like this: > > WHERE (a = 1 AND b = 1) OR (a = 2 AND b = 2) > > However, the new code doesn't apply the extended stats directly using > clauselist_selectivity_or() for this kind of query because there are > no RestrictInfos for the nested AND clauses, so > find_single_rel_for_clauses() (and similarly > statext_is_compatible_clause()) regards those clauses as not > compatible with extended stats. So what ends up happening is that > extended stats are used only when we descend down to the two AND > clauses, and their results are combined using the original "s1 + s2 - > s1 * s2" formula. That actually works OK in this case, because there > is no overlap between the two AND clauses, but it wouldn't work so > well if there was. > > I'm pretty sure that can be fixed by teaching > find_single_rel_for_clauses() and statext_is_compatible_clause() to > handle BoolExpr clauses, looking for RestrictInfos underneath them, > but I think that should be left for a follow-in patch.
Attached is a patch doing that, which improves a couple of the estimates for queries with AND clauses underneath OR clauses, as expected. This also revealed a minor bug in the way that the estimates for multiple statistics objects were combined while processing an OR clause -- the estimates for the overlaps between clauses only apply for the current statistics object, so we really have to combine the estimates for each set of clauses for each statistics object as if they were independent of one another. 0001 fixes the multiple-extended-stats issue for OR clauses, and 0002 improves the estimates for sub-AND clauses underneath OR clauses. These are both quite small patches, that hopefully won't interfere with any of the other extended stats patches. Regards, Dean
0001-Improve-estimation-of-OR-clauses-using-multiple-exte.patch
Description: Binary data
0002-Improve-estimation-of-ANDs-under-ORs-using-extended-.patch
Description: Binary data