Re: [HACKERS] Bloom index

2010-01-18 Thread Tom Lane
Teodor Sigaev writes: >> Yeah, (1) rand isn't a good random number generator and (2) fooling with > I tested rand() and random() generator and results was close to the same, and > rand() was even slightly better. That is true on some platforms, but on others rand() is very bad.

Re: [HACKERS] Bloom index

2010-01-18 Thread Greg Stark
2010/1/18 Teodor Sigaev : >> So for example if your bloom filter is 4 bits per column your error >> rate for a single column where clause will be 1/2^(4/1.44) or a little >> worse than returning 1 record in 7. If you test two or three columns >> then it'll be about 1 in 49 or 1 in 343. > > Hmm, I d

Re: [HACKERS] Bloom index

2010-01-18 Thread Teodor Sigaev
So for example if your bloom filter is 4 bits per column your error rate for a single column where clause will be 1/2^(4/1.44) or a little worse than returning 1 record in 7. If you test two or three columns then it'll be about 1 in 49 or 1 in 343. Hmm, I don't understand your calculations. In y

Re: [HACKERS] Bloom index

2010-01-18 Thread Teodor Sigaev
Yeah, (1) rand isn't a good random number generator and (2) fooling with I tested rand() and random() generator and results was close to the same, and rand() was even slightly better. the main random number sequence is user-unfriendly. If you need a That's really easy to fix. private sou

Re: [HACKERS] Bloom index

2010-01-18 Thread Greg Stark
On Mon, Jan 18, 2010 at 2:29 AM, Greg Stark wrote: > Bloom filters need to be sized to have n bits per set element to > maintain a targeted false positive rate. And that false positive rate > would be higher than just having an n bit hash for each set element. > Normally they have the advantage th

Re: [HACKERS] Bloom index

2010-01-17 Thread Tom Lane
Greg Stark writes: > The only thing that jumps out at me from the code itself is the use of > rand() and srand() which seems unacceptable. Yeah, (1) rand isn't a good random number generator and (2) fooling with the main random number sequence is user-unfriendly. If you need a private source of

Re: [HACKERS] Bloom index

2010-01-17 Thread Greg Stark
2010/1/13 Teodor Sigaev : >> This is pretty darn slick.  I haven't looked at the code yet, but the > > It's just a prototype and/or proof of concept The only thing that jumps out at me from the code itself is the use of rand() and srand() which seems unacceptable. At the very least the srand(attno

Re: [HACKERS] Bloom index

2010-01-13 Thread Oleg Bartunov
On Wed, 13 Jan 2010, Robert Haas wrote: 2010/1/13 Teodor Sigaev : CREATE INDEX bloomidx ON tbloom(i1,i2,i3)       WITH (length=5, col1=2, col2=2, col3=4); Here, we create bloom index with signature length 80 bits and attributes i1, i2  mapped to 2 bits, attribute i3 - to 4 bits. This is pret

Re: [HACKERS] Bloom index

2010-01-13 Thread Teodor Sigaev
This is pretty darn slick. I haven't looked at the code yet, but the It's just a prototype and/or proof of concept functionality sounds very cool, and I hope this is something we'll be able to have as part of PG in the future (though more than likely not for 8.5, I suspect). Of course -- Te

Re: [HACKERS] Bloom index

2010-01-13 Thread Robert Haas
2010/1/13 Teodor Sigaev : > CREATE INDEX bloomidx ON tbloom(i1,i2,i3) >       WITH (length=5, col1=2, col2=2, col3=4); > > Here, we create bloom index with signature length 80 bits and attributes > i1, i2  mapped to 2 bits, attribute i3 - to 4 bits. This is pretty darn slick. I haven't looked at

[HACKERS] Bloom index

2010-01-13 Thread Teodor Sigaev
README.bloom: contrib/bloom provides signature file based index. Authors: Teodor Sigaev (teo...@sigaev.ru) and Oleg Bartunov (o...@sai.msu.su) Notes: This is a *working* prototype, which can be finishing up to production in case of interest. Particularly, it misses wal support, because of commo