On Tue, Jul 4, 2017 at 10:56 PM, Kuntal Ghosh <kuntalghosh.2...@gmail.com> wrote: > On Tue, Jul 4, 2017 at 9:20 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: >> Kuntal Ghosh <kuntalghosh.2...@gmail.com> writes: >>> On Tue, Jul 4, 2017 at 9:23 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: >>>> ... I have to admit that I've failed to wrap my brain around exactly >>>> why it's correct. The arguments that I've constructed so far seem to >>>> point in the direction of applying the opposite correction, which is >>>> demonstrably wrong. Perhaps someone whose college statistics class >>>> wasn't quite so long ago can explain this satisfactorily? >> >>> I guess that you're referring the last case, i.e. >>> explain analyze select * from tenk1 where thousand between 10 and 10; >> >> No, the thing that is bothering me is why it seems to be correct to >> apply a positive correction for ">=", a negative correction for "<", >> and no correction for "<=" or ">". That seems weird and I can't >> construct a plausible explanation for it. I think it might be a >> result of the fact that, given a discrete distribution rather than >> a continuous one, the histogram boundary values should be understood >> as having some "width" rather than being zero-width points on the >> distribution axis. But the arguments I tried to fashion on that >> basis led to other rules that didn't actually work. >> >> It's also possible that this logic is in fact wrong and it just happens >> to give the right answer anyway for uniformly-distributed cases. >> > So, here are two points I think: > 1. When should we apply(add/subtract) the correction? > 2. What should be the correction? > > The first point: > there can be further two cases, > a) histfrac - actual_selectivity(p<=0) = 0. Sorry for the typo. I meant (p<=10) for all the cases.

