Re: [HACKERS] Statistics and selectivity estimation for ranges

2013-03-14 Thread Heikki Linnakangas
On 01.03.2013 16:22, Alexander Korotkov wrote: On Wed, Mar 13, 2013 at 11:10 PM, Heikki Linnakangas< hlinnakan...@vmware.com> wrote: On 01.03.2013 16:22, Alexander Korotkov wrote: frac = area / (length2 - length1); you can get NaN result. I've especially adjusted the code to get more of le

Re: [HACKERS] Statistics and selectivity estimation for ranges

2013-03-13 Thread Alexander Korotkov
On Wed, Mar 13, 2013 at 11:10 PM, Heikki Linnakangas < hlinnakan...@vmware.com> wrote: > On 01.03.2013 16:22, Alexander Korotkov wrote: > >> On Tue, Mar 12, 2013 at 8:03 PM, Heikki Linnakangas> vmware.com >> >>> wrote: >>> >> >> So, after some hacking, I ended up with this version. I find it mor

Re: [HACKERS] Statistics and selectivity estimation for ranges

2013-03-13 Thread Heikki Linnakangas
On 01.03.2013 16:22, Alexander Korotkov wrote: On Tue, Mar 12, 2013 at 8:03 PM, Heikki Linnakangas wrote: So, after some hacking, I ended up with this version. I find it more readable, I hope I didn't miss anything. This seems to produce results that are close, but not identical, to the origin

Re: [HACKERS] Statistics and selectivity estimation for ranges

2013-03-12 Thread Alexander Korotkov
Hi! Thanks for your work on this patch! On Tue, Mar 12, 2013 at 8:03 PM, Heikki Linnakangas wrote: > So, after some hacking, I ended up with this version. I find it more > readable, I hope I didn't miss anything. This seems to produce results that > are close, but not identical, to the original

Re: [HACKERS] Statistics and selectivity estimation for ranges

2013-03-12 Thread Heikki Linnakangas
On 01.03.2013 16:22, Alexander Korotkov wrote: I've been staring at this code for a very long time now, trying to understand how the math in calc_hist_selectivity_contained works. I think I understand it now, but it probably needs a lot more comments and perhaps some refactoring, so that the next

Re: [HACKERS] Statistics and selectivity estimation for ranges

2013-03-12 Thread Heikki Linnakangas
On 01.03.2013 16:22, Alexander Korotkov wrote: These changes were made in attached patch. Thanks. I've been staring at this code for a very long time now, trying to understand how the math in calc_hist_selectivity_contained works. I think I understand it now, but it probably needs a lot more

Re: [HACKERS] Statistics and selectivity estimation for ranges

2013-03-01 Thread Alexander Korotkov
On Wed, Feb 13, 2013 at 5:55 PM, Alexander Korotkov wrote: > On Wed, Feb 13, 2013 at 5:28 PM, Heikki Linnakangas < > hlinnakan...@vmware.com> wrote: > >> On 04.01.2013 10:42, Alexander Korotkov wrote: >> >>> /* >>> * Calculate selectivity of "&&" operator using histograms of range >>> lower bound

Re: [HACKERS] Statistics and selectivity estimation for ranges

2013-02-13 Thread Alexander Korotkov
On Wed, Feb 13, 2013 at 5:28 PM, Heikki Linnakangas wrote: > On 04.01.2013 10:42, Alexander Korotkov wrote: > >> /* >> * Calculate selectivity of "&&" operator using histograms of range lower >> bounds >> * and histogram of range lengths. >> */ >> static double >> calc_hist_selectivity_overlap

Re: [HACKERS] Statistics and selectivity estimation for ranges

2013-02-13 Thread Heikki Linnakangas
On 04.01.2013 10:42, Alexander Korotkov wrote: /* * Calculate selectivity of "&&" operator using histograms of range lower bounds * and histogram of range lengths. */ static double calc_hist_selectivity_overlap(TypeCacheEntry *typcache, RangeBound *lower,

Re: [HACKERS] Statistics and selectivity estimation for ranges

2013-01-04 Thread Alexander Korotkov
On Mon, Dec 10, 2012 at 11:21 PM, Jeff Davis wrote: > > And I have a few other questions/comments: > > * Why is "summ" spelled with two "m"s? Is it short for "summation"? If > so, might be good to use "summation of" instead of "integrate" in the > comment. > Fixed. > * Why does get_length_hist_

Re: [HACKERS] Statistics and selectivity estimation for ranges

2012-12-13 Thread Alexander Korotkov
Hi, Jeff! Thanks a lot for review! On Mon, Dec 10, 2012 at 11:21 PM, Jeff Davis wrote: > It looks like there are still some problems with this patch. > > CREATE TABLE foo(ir int4range); > insert into foo select 'empty' from generate_series(1,1); > insert into foo select int4range(NULL

Re: [HACKERS] Statistics and selectivity estimation for ranges

2012-12-10 Thread Jeff Davis
It looks like there are still some problems with this patch. CREATE TABLE foo(ir int4range); insert into foo select 'empty' from generate_series(1,1); insert into foo select int4range(NULL, g, '(]') from generate_series(1,100) g; insert into foo select int4range(g, NULL, '[)')

Re: [HACKERS] Statistics and selectivity estimation for ranges

2012-12-08 Thread Andres Freund
On 2012-11-05 22:10:50 -0800, Jeff Davis wrote: > On Mon, 2012-11-05 at 11:12 -0300, Alvaro Herrera wrote: > > What's going on with this patch? I haven't seen any activity in a > > while. Should I just move this to the next commitfest? > > Sorry, I dropped the ball here. I will still review it, w

Re: [HACKERS] Statistics and selectivity estimation for ranges

2012-11-05 Thread Jeff Davis
On Mon, 2012-11-05 at 11:12 -0300, Alvaro Herrera wrote: > What's going on with this patch? I haven't seen any activity in a > while. Should I just move this to the next commitfest? Sorry, I dropped the ball here. I will still review it, whether it makes this commitfest or not. Regards,

Re: [HACKERS] Statistics and selectivity estimation for ranges

2012-11-05 Thread Alvaro Herrera
What's going on with this patch? I haven't seen any activity in a while. Should I just move this to the next commitfest? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hacker

Re: [HACKERS] Statistics and selectivity estimation for ranges

2012-10-18 Thread Alvaro Herrera
Heikki, would you be able to give this patch a look and perhaps commit it? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscr

Re: [HACKERS] Statistics and selectivity estimation for ranges

2012-10-09 Thread Alexander Korotkov
On Mon, Oct 1, 2012 at 3:22 AM, Jeff Davis wrote: > On Tue, 2012-09-04 at 17:27 +0400, Alexander Korotkov wrote: > > Addon patch is attached. Actually, I don't get your intention of > > introducing STATISTIC_KIND_RANGE_EMPTY_FRAC stakind. Did you plan to > > leave it as empty frac in distinct sta

Re: [HACKERS] Statistics and selectivity estimation for ranges

2012-09-30 Thread Jeff Davis
On Tue, 2012-09-04 at 17:27 +0400, Alexander Korotkov wrote: > Addon patch is attached. Actually, I don't get your intention of > introducing STATISTIC_KIND_RANGE_EMPTY_FRAC stakind. Did you plan to > leave it as empty frac in distinct stakind or replace this stakind > with STATISTIC_KIND_LENGTH_HI

Re: [HACKERS] Statistics and selectivity estimation for ranges

2012-09-04 Thread Alexander Korotkov
On Mon, Aug 27, 2012 at 5:00 PM, Heikki Linnakangas < heikki.linnakan...@enterprisedb.com> wrote: > On 24.08.2012 18:51, Heikki Linnakangas wrote: > >> On 20.08.2012 00:31, Alexander Korotkov wrote: >> >>> New version of patch. >>> * Collect new stakind STATISTIC_KIND_BOUNDS_**HISTOGRAM, which is

Re: [HACKERS] Statistics and selectivity estimation for ranges

2012-08-27 Thread Alexander Korotkov
On Mon, Aug 27, 2012 at 5:00 PM, Heikki Linnakangas < heikki.linnakan...@enterprisedb.com> wrote: > On 24.08.2012 18:51, Heikki Linnakangas wrote: > >> On 20.08.2012 00:31, Alexander Korotkov wrote: >> >>> New version of patch. >>> * Collect new stakind STATISTIC_KIND_BOUNDS_**HISTOGRAM, which is

Re: [HACKERS] Statistics and selectivity estimation for ranges

2012-08-27 Thread Heikki Linnakangas
On 24.08.2012 18:51, Heikki Linnakangas wrote: On 20.08.2012 00:31, Alexander Korotkov wrote: New version of patch. * Collect new stakind STATISTIC_KIND_BOUNDS_HISTOGRAM, which is lower and upper bounds histograms combined into single ranges array, instead of STATISTIC_KIND_HISTOGRAM. One worr

Re: [HACKERS] Statistics and selectivity estimation for ranges

2012-08-24 Thread Heikki Linnakangas
On 20.08.2012 00:31, Alexander Korotkov wrote: On Thu, Aug 16, 2012 at 4:40 PM, Heikki Linnakangas< heikki.linnakan...@enterprisedb.com> wrote: On 15.08.2012 11:34, Alexander Korotkov wrote: Ok, we've to decide if we need "standard" histogram. In some cases it can be used for more accurate e

Re: [HACKERS] Statistics and selectivity estimation for ranges

2012-08-21 Thread Heikki Linnakangas
On 20.08.2012 00:31, Alexander Korotkov wrote: On Thu, Aug 16, 2012 at 4:40 PM, Heikki Linnakangas< heikki.linnakan...@enterprisedb.com> wrote: On 15.08.2012 11:34, Alexander Korotkov wrote: Ok, we've to decide if we need "standard" histogram. In some cases it can be used for more accurate e

Re: [HACKERS] Statistics and selectivity estimation for ranges

2012-08-19 Thread Alexander Korotkov
On Thu, Aug 16, 2012 at 4:40 PM, Heikki Linnakangas < heikki.linnakan...@enterprisedb.com> wrote: > On 15.08.2012 11:34, Alexander Korotkov wrote: > >> Ok, we've to decide if we need "standard" histogram. In some cases it can >> be used for more accurate estimation of< and> operators. >> But I t

Re: [HACKERS] Statistics and selectivity estimation for ranges

2012-08-16 Thread Heikki Linnakangas
On 15.08.2012 11:34, Alexander Korotkov wrote: On Wed, Aug 15, 2012 at 12:14 PM, Heikki Linnakangas< heikki.linnakan...@enterprisedb.com> wrote: Histogram of upper bounds would be both more accurate and natural for some operators. However, it requires collecting additional statistics while AF

Re: [HACKERS] Statistics and selectivity estimation for ranges

2012-08-16 Thread Heikki Linnakangas
On 15.08.2012 11:34, Alexander Korotkov wrote: On Wed, Aug 15, 2012 at 12:14 PM, Heikki Linnakangas< heikki.linnakan...@enterprisedb.com> wrote: Histogram of upper bounds would be both more accurate and natural for some operators. However, it requires collecting additional statistics while AF

Re: [HACKERS] Statistics and selectivity estimation for ranges

2012-08-15 Thread Tom Lane
Alexander Korotkov writes: > Ok, we've to decide if we need "standard" histogram. In some cases it can > be used for more accurate estimation of < and > operators. > But I think it is not so important. So, we can replace "standard" histogram > with histograms of lower and upper bounds? You should

Re: [HACKERS] Statistics and selectivity estimation for ranges

2012-08-15 Thread Alexander Korotkov
On Wed, Aug 15, 2012 at 12:14 PM, Heikki Linnakangas < heikki.linnakan...@enterprisedb.com> wrote: > Histogram of upper bounds would be both more >> accurate and natural for some operators. However, it requires collecting >> additional statistics while AFAICS it doesn't liberate us from having >>

Re: [HACKERS] Statistics and selectivity estimation for ranges

2012-08-15 Thread Heikki Linnakangas
On 15.08.2012 10:38, Alexander Korotkov wrote: On Tue, Aug 14, 2012 at 7:46 PM, Heikki Linnakangas< heikki.linnakan...@enterprisedb.com> wrote: It would be quite easy to provide reasonable estimates for those operators, if we had a separate histogram of upper bounds. I also note that the estim

Re: [HACKERS] Statistics and selectivity estimation for ranges

2012-08-15 Thread Alexander Korotkov
On Tue, Aug 14, 2012 at 7:46 PM, Heikki Linnakangas < heikki.linnakan...@enterprisedb.com> wrote: > On 14.08.2012 09:45, Alexander Korotkov wrote: > >> After fixing few more bugs, I've a version with much more reasonable >> accuracy. >> > > Great! One little thing just occurred to me: > > You're r

Re: [HACKERS] Statistics and selectivity estimation for ranges

2012-08-14 Thread Heikki Linnakangas
On 14.08.2012 09:45, Alexander Korotkov wrote: After fixing few more bugs, I've a version with much more reasonable accuracy. Great! One little thing just occurred to me: You're relying on the regular scalar selectivity estimators for the <<, >>, &< and &> operators. That seems bogus, in part

Re: [HACKERS] Statistics and selectivity estimation for ranges

2012-08-13 Thread Alexander Korotkov
On Mon, Aug 13, 2012 at 1:11 AM, Alexander Korotkov wrote: > On Thu, Aug 9, 2012 at 12:44 AM, Alexander Korotkov > wrote: > >> My conclusion is so, that current errors are probably ok for selectivity >> estimation. But taking into attention that generated datasets ideally fits >> assumptions of e

Re: [HACKERS] Statistics and selectivity estimation for ranges

2012-08-12 Thread Alexander Korotkov
On Thu, Aug 9, 2012 at 12:44 AM, Alexander Korotkov wrote: > My conclusion is so, that current errors are probably ok for selectivity > estimation. But taking into attention that generated datasets ideally fits > assumptions of estimation, there could be room for improvement. Especially, > it's un

Re: [HACKERS] Statistics and selectivity estimation for ranges

2012-08-09 Thread Alexander Korotkov
New revision of patch with two fixes: 1) Check if histogram bin width is zero in get_position. 2) Check statsTuple is valid tuple in rangecontsel. -- With best regards, Alexander Korotkov. range_stat-0.3.patch.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (

Re: [HACKERS] Statistics and selectivity estimation for ranges

2012-08-07 Thread Matthias
Having statistics on ranges was really missing! The planner was doing some really, really bad choices on bigger tables regarding seq/random scans, nested loop/other joins etc. Is there any chance this makes it into 9.2 final? It would really round-off the introduction of range types and maybe avoi

Re: [HACKERS] Statistics and selectivity estimation for ranges

2012-08-06 Thread Alexander Korotkov
On Mon, Aug 6, 2012 at 6:09 PM, Heikki Linnakangas < heikki.linnakan...@enterprisedb.com> wrote: > On 04.08.2012 12:31, Alexander Korotkov wrote: > >> Hackers, >> >> attached patch is for collecting statistics and selectivity estimation for >> ranges. >> >> In order to make our estimations accurat

Re: [HACKERS] Statistics and selectivity estimation for ranges

2012-08-06 Thread Heikki Linnakangas
On 04.08.2012 12:31, Alexander Korotkov wrote: Hackers, attached patch is for collecting statistics and selectivity estimation for ranges. In order to make our estimations accurate for every distribution of ranges, we would collect 2d-distribution of lower and upper bounds of range into some ki