I wonder if we have tested the reasoning behind having SKIP_PAGES_THRESHOLD and the magic number of 32 assigned to it currently. While looking at the code after a long time and doing some tests, I realized that a manual VACUUM would always scan first 31 pages of a relation which has not received any write activity since the last VACUUM. On closer inspection, I realized that we have deliberately put in this hook to ensure that we use visibility maps only when we see at least SKIP_PAGES_THRESHOLD worth of all-visible sequential pages to take advantage of possible OS seq scan optimizations.
My statistical skills are limited, but wouldn't that mean that for a fairly well distributed write activity across a large table, if there are even 3-4% update/deletes, we would most likely hit a not-all-visible page for every 32 pages scanned ? That would mean that almost entire relation will be scanned even if the visibility map tells us that only 3-4% pages require scanning ? And the probability will increase with the increase in the percentage of updated/deleted tuples. Given that the likelihood of anyone calling VACUUM (manually or through autovac settings) on a table which has less than 3-4% updates/deletes is very low, I am worried that might be loosing all advantages of visibility maps for a fairly common use case. Do we have any numbers to prove what we have today is good ? Sorry, I may not have followed the discussions very closely in the past and not sure if this has been debated/tested already. Thanks, Pavan -- Pavan Deolasee EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers