Re: [HACKERS] WIP patch: distinguish selectivity of < from <= and > from >=

Kuntal Ghosh Thu, 06 Jul 2017 02:15:07 -0700

On Thu, Jul 6, 2017 at 3:45 AM, Tom Lane <[email protected]> wrote:
> I wrote:
>> (Pokes at it some more...) Oh, interesting: it behaves that way except
>> when p is exactly the lowest histogram entry.
>
+ /*
+ * In the first bin (i==1), add a fudge factor that ensures
+ * that histfrac is at least eq_selec.  We do this because we
+ * know that the first histogram entry does satisfy the
+ * inequality (if !isgt) or not satisfy it (if isgt), so our
+ * estimate here should certainly not be zero even if binfrac
+ * is zero.  (XXX experimentally this is the correct way to do
+ * it, but why isn't it a linear adjustment across the whole
+ * histogram rather than just the first bin?)
+ */
Given that the values are distinct, (I've some doubts for real number case)


if histogram_bounds are assigned as,
{0,9,19,29,39,49,59,69,79,89,99,109,119,129,13,..........}

I think the buckets are defined as,
0 < bucket1 <= 9
9 < bucket2 <=19
19 < bucket3 <= 29 and so on.

Because, the histfrac is calculated as follows:

histfrac = (double) (bucket_current - 1) + (val - low) / (high - low);
(where bucket_current is obtained by doing a binary search on
histogram_bounds.)
histfrac /= (double) (nvalues - 1);

So, if val=low, then hisfrac = (bucket_current - 1)/num_of_buckets
which means it assumes val is included in the previous bucket.

This means that it always fails to calculate the selectivity for
lowest histogram boundary. Hence, we need adjustment only for the
first bucket.

Do you think my reasoning justifies your concern?

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP patch: distinguish selectivity of < from <= and > from >=

Reply via email to