* Atri Sharma (atri.j...@gmail.com) wrote: > My point is that I would like to help in the implementation, if possible. :)
Feel free to go ahead and implement it.. I'm not sure when I'll have a chance to (probably not in the next week or two anyway). Unfortunately, the bigger issue here is really about testing the results and determining if it's actually faster/better with various data sets (including ones which have duplicates). I've got one test data set which has some interesting characteristics (for one thing, hashing the "large" side and then seq-scanning the "small" side is actually faster than going the other way, which is quite 'odd' imv for a hashing system): http://snowman.net/~sfrost/test_case2.sql You might also look at the other emails that I sent regarding this subject and NTUP_PER_BUCKET. Having someone confirm what I saw wrt changing that parameter would be nice and it would be a good comparison point against any kind of pre-filtering that we're doing. One thing that re-reading the bloom filter description reminded me of is that it's at least conceivable that we could take the existing hash functions for each data type and do double-hashing or perhaps seed the value to be hashed with additional data to produce an "independent" hash result to use. Again, a lot of things that need to be tested and measured to see if they improve overall performance. Thanks, Stephen
signature.asc
Description: Digital signature