On Fri, Apr 28, 2017 at 9:02 PM, Peter Geoghegan <p...@bowt.ie> wrote:
> I'd like to hear feedback on the general idea, and what the
> user-visible interface ought to look like. The non-deterministic false
> negatives may need to be considered by the user visible interface,
> which is the main reason I mention it.

Bloom filters are one of those things that come up on this mailing
list incredibly frequently but rarely get used in committed code; thus
far, contrib/bloom is the only example we've got, and not for lack of
other proposals.  One problem is that Bloom filters assume you can get
n independent hash functions for a given value, which we have not got.
That problem would need to be solved somehow.  If you only have one
hash function, the size of the required bloom filter probably gets
very large.

When hashing index and heap tuples, do you propose to include the heap
TID in the data getting hashed?  I think that would be a good idea,
because otherwise you're only verifying that every heap tuple has an
index pointer pointing at something, not that every heap tuple has an
index tuple pointing at the right thing.

I wonder if it's also worth having a zero-error mode, even if it runs
for a long time.  Scan the heap, and probe the index for the value
computed from each heap tuple.  Maybe that's so awful that nobody
would ever use it, but I'm not sure.  It might actually be simpler to
implement than what you have in mind.

Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to