Dne 24.12.2010 04:41, Florian Pflug napsal(a): > The filter size could be derived from the table's statistics target, or > be otherwise user-definable. We could also auto-resize once it gets too > full. But still, that all seems awfully complex :-(
Using a statistics target is a good idea I think. I think we could use it to determine error rate of the filter. Something like error rate = 10 - 0.9 * (statistics_target - 100) which gives 1% for statistics target 1000 10% for statistics target 100 or maybe something like this (where the error rate grows faster for smaller statistic target values) error rate = 11 - 91000 / (statistics_target^2) which gives about 1% for statistics target 1000 10% for statistics targer 100 36% for statistics target 50 But I guess 10% error rate is the minimum we need so it does not make much sense to use lower values. >> > Another possibility is to collect the data from just a small portion >> > of a table and then use the result to estimate the number of distinct >> > values for the whole table. But I'm not sure we can do this reliably, >> > I see many traps in this. > This is how it works currently. The problem with this approach is that > it gives you very little guarantees about how precise the result will be. > Extrapolating works very well for things like MKVs and histograms, because > there you're by definition interested mostly in values which occur often - > and thus with a high probability in the relative few rows you sample. For > the number of distinct values, however, this isn't true - if ndistinct > is an order of magnitude smaller than the number of rows, relatively few > rows can account for a large percentage of the distinct values... That basically means we need to sample a large portion of the table :-( > Another idea would be to obtain the ndistinct values from an index somehow. > Postgres cannot currently scan an index in physical order, only in logical > order, due to locking considerations. But since we'd only be interested in > an estimate, maybe a scan in physical block order would work for ndistinc > estimates? Just a wild idea, mind you, I haven't checked at all if that'd > be even remotely feasible. I was thinking about that too, and I think we could do this using pageinspect contrib module. Sure, there might be a problem with bloated indexes. And relying on this actually means it's required to have a multi-column index on all the columns. Individual indexes are not enough as we need to get the number of distinct combinations too. regards Tomas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers