Hi, On 10/27/2017 05:22 PM, Sokolov Yura wrote: > > Hi, Tomas > > BRIN bloom index is a really cool feature, that definitely should be in > core distribution (either in contrib or builtin)!!! > > Small suggestion for algorithm: > > It is well known practice not to calculate whole hash function for every > point, but use double hashing approach. > Example of proving double hashing approach for bloom filters: > https://www.eecs.harvard.edu/~michaelm/postscripts/rsa2008.pdf > > So, instead of: > for (i = 0; i < filter->nhashes; i++) > { > /* compute the hashes with a seed, used for the bloom filter */ > uint32 h = ((uint32)DatumGetInt64(hash_uint32_extended(value, > i))) % filter->nbits; > > /* set or check bit h */ > } > > such construction could be used: > /* compute the hashes, used for the bloom filter */ > uint32 big_h = ((uint32)DatumGetInt64(hash_uint32_extended(value, i))); > uint32 h = big_h % filter->nbits; > /* ensures d is never >= filter->nbits */ > uint32 d = big_h % (filter->nbits - 1); > > for (i = 0; i < filter->nhashes; i++) > { > /* set or check bit h */ > > /* compute next bit h */ > h += d++; > if (h >= filter->nbits) h -= filter->nbits; > if (d == filter->nbits) d = 0; > } > > Modulo of one 64bit result by two coprime numbers (`nbits` and `nbits-1`) > gives two good-quality functions usable for double hashing. >
OK, thanks for the suggestion. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers