On 09/14/2016 05:17 PM, Robert Haas wrote:
I am kind of doubtful about this whole line of investigation because
we're basically trying pretty hard to fix something that I'm not sure
is broken.    I do agree that, all other things being equal, the TID
lookups will probably be faster with a bitmap than with a binary
search, but maybe not if the table is large and the number of dead
TIDs is small, because cache efficiency is pretty important.  But even
if it's always faster, does TID lookup speed even really matter to
overall VACUUM performance? Claudio's early results suggest that it
might, but maybe that's just a question of some optimization that
hasn't been done yet.

Regarding the lookup performance, I don't think the bitmap alone can significantly improve that - it's more efficient memory-wise, no doubt about that, but it's still likely larger than CPU caches and accessed mostly randomly (when vacuuming the indexes).

IMHO the best way to speed-up lookups (if it's really an issue, haven't done any benchmarks) would be to build a small bloom filter in front of the TID array / bitmap. It shall be fairly small (depending on the number of TIDs, error rate etc.) and likely to fit into L2/L3, and eliminate a lot of probes into the much larger array/bitmap.

Of course, it's another layer of complexity - the good thing is we don't need to build the filter until after we collect the TIDs, so we got pretty good inputs for the bloom filter parameters.

But all this is based on the assumption that the lookups are actually expensive, not sure about that.


Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to