Re: [HACKERS] Statistics and selectivity estimation for ranges

Alexander Korotkov Mon, 06 Aug 2012 21:27:00 -0700

On Mon, Aug 6, 2012 at 6:09 PM, Heikki Linnakangas <
[email protected]> wrote:

> On 04.08.2012 12:31, Alexander Korotkov wrote:
>
>> Hackers,
>>
>> attached patch is for collecting statistics and selectivity estimation for
>> ranges.
>>
>> In order to make our estimations accurate for every distribution of
>> ranges, we would collect 2d-distribution of lower and upper bounds of
>> range
>> into some kind of 2d-histogram. However, this patch use some
>> simplification
>> and assume distribution of lower bound and distribution of length to be
>> independent.
>>
>
> Sounds reasonable. Another possibility would be to calculate the average
> length for each lower-bound bin. So you would e.g know the average length
> of values with lower bound between 1-10, and the average length of values
> with lower bound between 10-20, and so forth. Within a bin, you would have
> to assume that the distribution of the lengths is fixed.
>

Interesting idea. AFAICS, if we store average length for each lower-bound
bin, we still have to assume some kind of distribution of range length in
order to do estimates. For example, assume that range length have
exponential distribution. Correspondingly, we've following trade off: we
don't have to assume lower bound distribution to be independent from length
distribution, but we have to assume kind of length distribution. Actually,
I don't know what is better.
Ideally, we would have range length histogram for each lower-bound bin, or
upper-bound histogram for each lower-bound bin. But, storing such amount of
data seems too expensive.

------
With best regards,
Alexander Korotkov.

Re: [HACKERS] Statistics and selectivity estimation for ranges

Reply via email to