On Tue, Apr 4, 2017 at 8:21 AM, David Rowley <david.row...@2ndquadrant.com> wrote: > On 4 April 2017 at 13:35, Claudio Freire <klaussfre...@gmail.com> wrote: >> On Mon, Apr 3, 2017 at 9:19 PM, Claudio Freire <klaussfre...@gmail.com> >> wrote: >>> On Mon, Apr 3, 2017 at 8:52 PM, David Rowley >>> <david.row...@2ndquadrant.com> wrote: >>>>> One last observation: >>>>> >>>>> + /* >>>>> + * An IS NOT NULL test is a no-op if there's any other strict >>>>> quals, >>>>> + * so if that's the case, then we'll only apply this, otherwise >>>>> we'll >>>>> + * ignore it. >>>>> + */ >>>>> + else if (cslist->selmask == CACHESEL_NOTNULLTEST) >>>>> + s1 *= cslist->notnullsel; >>>>> >>>>> In some paths, the range selectivity estimation code punts and returns >>>>> DEFAULT_RANGE_INEQ_SEL. In those cases, if a not null test was >>>>> present, it should still be applied. It could make a big difference if >>>>> the selectivity for the nulltest is high. >>>> >>>> I'd say that the location that applies the DEFAULT_RANGE_INEQ_SEL >>>> should apply the nullfrac. Seems wrong to rely on a IS NOT NULL test >>>> to exists to estimate that properly. I don't think that needs done as >>>> part of this patch. I'd rather limit the scope since "returned with >>>> feedback" is already knocking at the door of this patch. >>> >>> I agree, but this patch regresses for those cases if the user >>> compensated with a not null test, leaving little recourse, so I'd fix >>> it in this patch to avoid the regression. >>> >>> Maybe you're right that the null fraction should be applied regardless >>> of whether there's an explicit null check, though. >> >> The more I read that code, the more I'm convinced there's no way to >> get a DEFAULT_RANGE_INEQ_SEL if (rqlist->hibound != DEFAULT_INEQ_SEL >> && >> rqlist->lobound != DEFAULT_INEQ_SEL) other than from contradictory >> clauses, a case the current code handles very poorly, returning >> DEFAULT_RANGE_INEQ_SEL and ignoring nullfrac, something that could >> give way-off estimates if the table had lots of nulls. >> >> Say, consider if "value" was mostly null and you write: >> >> SELECT * FROM testdata WHERE value BETWEEN 10 AND 20 AND value BETWEEN >> 50 AND 60; >> >> All other cases I can think of would end with either hibound or >> lobound being equal to DEFAULT_INEQ_SEL. >> >> It seems to me, doing a git blame, that the checks in the else branch >> were left only as a precaution, one that seems unnecessary. >> >> If you ask me, I'd just leave: >> >> + if (rqlist->hibound == DEFAULT_INEQ_SEL || rqlist->lobound == >> DEFAULT_INEQ_SEL) >> + { >> + /* No point in checking null selectivity, DEFAULT_INEQ_SEL >> implies we have no stats */ >> + s2 = DEFAULT_RANGE_INEQ_SEL; >> + } >> + else >> + { >> ... >> + s2 = rqlist->hibound + rqlist->lobound - 1.0; >> + nullsel = nulltestsel(root, IS_NULL, rqlist->var, varRelid); >> + notnullsel = 1.0 - nullsel; >> + >> + /* Adjust for double-exclusion of NULLs */ >> + s2 += nullsel; >> + if (s2 <= 0.0) { >> + if (s2 <= (-1.0e-4 * notnullsel - 1.0e-6)) >> + { >> + /* Most likely contradictory clauses found */ >> + s2 = (1.0e-10 < notnullsel) ? 1.0e-10 : notnullsel; >> + } >> + else >> + { >> + /* Could be a rounding error */ >> + s2 = DEFAULT_RANGE_INEQ_SEL * notnullsel; >> + } >> + } >> + } >> >> Where (-1.0e-4 * notnullsel - 1.0e-6) is just a very rough (and overly >> cautious) estimation of the amount of rounding error that could be >> there with 32-bit floats. >> >> The above assumes a non-DEFAULT_INEQ_SEL value in lobound/hibound is >> an estimation based on stats, maybe approximate, but one that makes >> sense (ie: preserves the monotonicity of the CPF), and as such >> negative values are either sign of a contradiction or rounding error. >> The previous code left the possibility of negatives coming out of >> default selectivities creeping up on non-DEFAULT_INEQ_SEL estimates, >> but that doesn't seem possible coming out of scalarineqsel. >> >> But I'd like it if Tom could comment on that, since he's the one that >> did that in commit 547bb4a7f2bdccad9253a99211ce84b3f9de485a (way back >> in 2004). >> >> Barring that, I'd just change the >> >> s2 = DEFAULT_RANGE_INEQ_SEL; >> >> To >> >> s2 = DEFAULT_RANGE_INEQ_SEL * notnullsel; >> >> Which makes a lot more sense. > > I think to fix this properly the selfuncs would have to return the > estimate, and nullfrac separately, so the nullfrac could just be > applied once per expression. There's already hacks in there to get rid > of the double null filtering. I'm not proposing that as something for > this patch. It would be pretty invasive to change this.
There's no need, you can compute the nullfrac with nulltestsel. While there's some rework involved, it's not a big deal (it just reads the stats tuple), and it's a clean solution. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers