I thought I would have a go at creating a very simple bloom filter in J. I used
a boolean array as the backing data structure. It seems to work ok:
Bloom filters are nice data structures for holding lossy information about
items you stick in them. You can't retrieve the items but you can perform a
test to see if the item has been placed into the bloom filter previously. An
interesting property is the imperfect nature of the test. It will return false
positives sometimes, items that were not placed into the bloom filter but the
test returns a positive result anyway.
One use I know of is for spell checkers. You don't need to have an actual array
/ map of correctly spelled words - just a bloom filter that you can test new
words against. Anyway, this is my implementation (the hash function is just a
simple test function - probably better to use MD5 or murmur or something).
NB. bloom filter
filtersize=: 100
keysize=:50
NB. the bloom filter expressed as a boolean array
BF=:filtersize $ 0
NB. silly hash -- used for simple testing
hash=: filtersize&|@:(29211&XOR)@:((3020&AND)+(1091&XOR))@:(+/)@:(a.&i.)@:":
lrhash =: hash@:(":,":)
NB. add an item to the bloom filter.
NB. Create an array of size 'keysize'
NB. and do multiple hashing for each
NB. item. These hashes are the indices
NB. to be inserted into the bloom filter.
additem=: 3 : 0
l=:(hash y),((<:keysize)$0)
indices=:lrhash/\ l
BF =:1 indices} BF
)
NB. Test if y is an item in the bloom filter.
NB. Possibility of false positive.
contains=: 3 : 0
l=:(hash y),((<:keysize)$0)
indices=:lrhash/\ l
NB. if a zero exists then this item
NB. was not added to the bloom filter.
-.(0 e. indices{BF)
)
Add some items to the bloom filter, animal names.
NB. Add some items ot the filter.
additem&.>
('dog';'cat';'bird';'octopus';'horse';'cow';'mouse';'chicken';'salmon';'hamster';'wolf';'elephant';'pig';'sheep';'zebra')
and we can test words:
contains 'dog'
1
contains 'human'
1
Oh, false positive. 'human' isn't in there.
contains 'Dog'
0
Case sensitive, obviously.
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm