On Mon, 2006-09-18 at 17:46 +0200, Matteo Beccati wrote: > Tom Lane ha scritto: > > Matteo Beccati <[EMAIL PROTECTED]> writes: > >> I cannot see anything bad by using something like that: > >> if (histogram is large/representative enough) > > > > Well, the question is exactly what is "large enough"? I feel a bit > > uncomfortable about applying the idea to a histogram with only 10 > > entries (especially if we ignore two of 'em). With 100 or more, > > it sounds all right. What's the breakpoint? > > Yes, I think 100-200 could be a good breakpoint. I don't actually know > what is the current usage of SET STATISTICS, I usually set it to 1000 > for columns which need more precise selectivity. > > The breakpoint could be set even higher (500?) so there is space to > increase statistics without enabling the histogram check, but I don't > feel very comfortable though suggesting this kind of possibly > undocumented side effect...
Hi everyone, You may be interested to have a look at the statistics collector for the geometry type within PostGIS. In order to prevent very large or very small geometries from ruining the statistics histogram and generating incorrect query plans, we make the assumption that the column distribution is likely to be close to normal, and then remove any ANALYZE-collected geometries from the set that lie outside +/- 3.25 standard deviations from the mean before creating the final histogram (removes just under 1% of the data from each end of an assumed normal distribution). This works well and AFAIK we've only ever had one reported case of an incorrect query plan being generated using this method. Kind regards, Mark. ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster