On Fri, Apr 28, 2017 at 9:02 PM, Peter Geoghegan <p...@bowt.ie> wrote: > I'd like to hear feedback on the general idea, and what the > user-visible interface ought to look like. The non-deterministic false > negatives may need to be considered by the user visible interface, > which is the main reason I mention it.
Bloom filters are one of those things that come up on this mailing list incredibly frequently but rarely get used in committed code; thus far, contrib/bloom is the only example we've got, and not for lack of other proposals. One problem is that Bloom filters assume you can get n independent hash functions for a given value, which we have not got. That problem would need to be solved somehow. If you only have one hash function, the size of the required bloom filter probably gets very large. When hashing index and heap tuples, do you propose to include the heap TID in the data getting hashed? I think that would be a good idea, because otherwise you're only verifying that every heap tuple has an index pointer pointing at something, not that every heap tuple has an index tuple pointing at the right thing. I wonder if it's also worth having a zero-error mode, even if it runs for a long time. Scan the heap, and probe the index for the value computed from each heap tuple. Maybe that's so awful that nobody would ever use it, but I'm not sure. It might actually be simpler to implement than what you have in mind. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers