On Mon, Aug 27, 2012 at 5:00 PM, Heikki Linnakangas < heikki.linnakan...@enterprisedb.com> wrote:
> On 24.08.2012 18:51, Heikki Linnakangas wrote: > >> On 20.08.2012 00:31, Alexander Korotkov wrote: >> >>> New version of patch. >>> * Collect new stakind STATISTIC_KIND_BOUNDS_**HISTOGRAM, which is lower >>> and >>> upper bounds histograms combined into single ranges array, instead >>> of STATISTIC_KIND_HISTOGRAM. >>> >> >> One worry I have about that format for the histogram is that you >> deserialize all the values in the histogram, before you do the binary >> searches. That seems expensive if stats target is very high. I guess you >> could deserialize them lazily to alleviate that, though. >> >> * Selectivity estimations for>,>=,<,<=,<<,>>,&<,&> using this >>> histogram. >>> >> >> Thanks! >> >> I'm going to do the same for this that I did for the sp-gist patch, and >> punt on the more complicated parts for now, and review them separately. >> Attached is a heavily edited version that doesn't include the length >> histogram, and consequently doesn't do anything smart for the &< and &> >> operators. && is estimated using the bounds histograms. There's now a >> separate stakind for the empty range fraction, since it's not included >> in the length-histogram. >> >> I tested this on a dataset containing birth and death dates of persons >> that have a wikipedia page, obtained from the dbpedia.org project. I can >> send a copy if someone wants it. The estimates seem pretty accurate. >> >> Please take a look, to see if I messed up something. >> > > Committed this with some further changes. Thanks! Sorry for I didn't provide a feedback for previous message. Commited patch looks nice for me. I'm going to provide additional patch with length-histogram and more selectivity estimates. ------ With best regards, Alexander Korotkov.