2010/1/18 Teodor Sigaev <teo...@sigaev.ru>: >> So for example if your bloom filter is 4 bits per column your error >> rate for a single column where clause will be 1/2^(4/1.44) or a little >> worse than returning 1 record in 7. If you test two or three columns >> then it'll be about 1 in 49 or 1 in 343. > > Hmm, I don't understand your calculations. In your example (4 bits per > column, index has only one column) each row is hashed by 4 bits in 80-bit > vector (signature), so, probability of collision is (1/80)^4
Sorry, I meant if you had n columns in the index, and the signature was 4*n bits, and the where clause has one key in the search condition. According to some guy editing the wikipedia page the number of bits per value in the set a bloom filter needs to achieve an error rate of e is 1.44*log_2(1/e). So the error rate for 4 bits per column would be 1/2^(1.44*nbits). That's larger than the error rate for a simple hash which would be 1/2^nbits. The reason bloom filters are advantageous when checking for a value *anywhere* in the set is that with a simple hash of each value you would have that error rate for each check, so you would have a much higher false positive rate for finding it somewhere. But in your case you know which position you want to check. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers