# Re: [HACKERS] Statistics and selectivity estimation for ranges

On 20.08.2012 00:31, Alexander Korotkov wrote:
On Thu, Aug 16, 2012 at 4:40 PM, Heikki Linnakangas<
heikki.linnakan...@enterprisedb.com>  wrote:

On 15.08.2012 11:34, Alexander Korotkov wrote:

Ok, we've to decide if we need "standard" histogram. In some cases it can
be used for more accurate estimation of<   and>   operators.
But I think it is not so important. So, we can replace "standard"
histogram
with histograms of lower and upper bounds?

Yeah, I think that makes more sense. The lower bound histogram is still
useful for<  and>  operators, just not as accurate if there are lots of
values with the same lower bound but different upper bound.

New version of patch.
* Collect new stakind STATISTIC_KIND_BOUNDS_HISTOGRAM, which is lower and
upper bounds histograms combined into single ranges array, instead
of STATISTIC_KIND_HISTOGRAM.

Ah, that's an interesting approach. So essentially, the histogram looks just like a normal STATISTIC_KIND_HISTOGRAM histogram, but the values stored in it are not picked the usual way. The usual way would be to pick N evenly-spaced values from the column, and store those. Instead, you pick N evenly-spaced lower bounds, and N evenly-spaced upper bounds, and construct N range values from those. Looking at a single value in the histogram, its lower bound comes from a different row than its upper bound.

That's pretty clever - the histogram has a shape and order that's compatible with a histogram you'd get with the standard scalar typanalyze function. In fact, I think you could just let the standard scalar estimators for < and > to use that histogram as is. Perhaps we should use STATISTIC_KIND_HISTOGRAM for this after all...

--
Heikki Linnakangas
EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers