Florian Pflug <f...@phlo.org> wrote: > Maybe one can get rid of these sorts of problems by factoring in > the expected density of the table beforehand and simply accepting > that the results will be inaccurate if the statistics are > outdated? > > One could, for example, simply pick > > N := SamplingPercentage * MaxTuplesPerPage / > AvgLiveTuplesPerPage > > where > > AvgLiveTuplesPerPage := #Tuples / #Pages > > random TIDs, fetch the live ones, and return them. To clarify, I read this as using reltuples and relpages for the table, and returning only tuples which are visible according to the query's snapshot. (i.e., I think you used "live" to mean two different things there.) Unless I'm missing something, I think that works for percentage selection, which is what the standard talks about, without any need to iterate through addition samples. Good idea! We don't need to do any second pass to pare down initial results, either. This greatly simplifies coding while providing exactly what the standard requires. > I'm not totally sure whether this approach is sensible to > non-uniformity in the tuple to line-pointer assignment, though. I think we can solve that by going high enough with tuple numbers to reach the highest tuple ID that might be in use in the table, and *not* following HOT chains. (If we follow HOT chains, we could have several distinct ctid values which returned the same tuple.) Or am I thinking about this incorrectly? > [more complex alternatives] I really think your first suggestion covers it perfectly; these more complex techniques don't seem necessary to me. -Kevin
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers