On Jun 6, 10:47 am, geremy condra <debat...@gmail.com> wrote: > On Fri, Jun 3, 2011 at 1:17 PM, Raymond Hettinger <pyt...@rcn.com> wrote: > > Thanks for all the feedback on the earlier post. > > > I've updated the recipe to use a cleaner API, simpler code, > > more easily subclassable, and with optional optimizations > > for better cache utilization and speed: > > > http://code.activestate.com/recipes/577684-bloom-filter/ > > Any chance of this landing in collections?
I would be happy to add something like this to collections once the API matures. Right now, the constructor expects the user to know the desired memory size and number of probes. Those could be computed from the maximum number of unique keys and the tolerable error false positive rate. The most convenient constructor could be: b = BloomFilter(myseq) where len(myseq) is presumed indicate the maximum number of unique entries and there is a default tolerable false positive rate of 1 in 10,000. The API already lets the user substitute different methods for get_probes(). It should also allow the bytearray to be easily replaced by other objects with indexable access (array objects, mmap objects, etc). We could also provide union, intersection, and difference operations but these would be a little awkward for general purpose use because the result is only meaningful when the sizes, probe functions, and input domain are all the same. Fortunately, there is a lot of prior art, so the API design can be informed by what has worked well for others. Raymond -- http://mail.python.org/mailman/listinfo/python-list