On 09/14/2016 05:17 PM, Robert Haas wrote:
I am kind of doubtful about this whole line of investigation because we're basically trying pretty hard to fix something that I'm not sure is broken. I do agree that, all other things being equal, the TID lookups will probably be faster with a bitmap than with a binary search, but maybe not if the table is large and the number of dead TIDs is small, because cache efficiency is pretty important. But even if it's always faster, does TID lookup speed even really matter to overall VACUUM performance? Claudio's early results suggest that it might, but maybe that's just a question of some optimization that hasn't been done yet.
Regarding the lookup performance, I don't think the bitmap alone can significantly improve that - it's more efficient memory-wise, no doubt about that, but it's still likely larger than CPU caches and accessed mostly randomly (when vacuuming the indexes).
IMHO the best way to speed-up lookups (if it's really an issue, haven't done any benchmarks) would be to build a small bloom filter in front of the TID array / bitmap. It shall be fairly small (depending on the number of TIDs, error rate etc.) and likely to fit into L2/L3, and eliminate a lot of probes into the much larger array/bitmap.
Of course, it's another layer of complexity - the good thing is we don't need to build the filter until after we collect the TIDs, so we got pretty good inputs for the bloom filter parameters.
But all this is based on the assumption that the lookups are actually expensive, not sure about that.
regards Tomas -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers