Is there a good aspect to reporting false positives?

On Mon, Nov 2, 2015 at 10:31 AM, 'Jon Hough' via Programming <
programm...@jsoftware.com> wrote:

>
> I thought I would have a go at creating a very simple bloom filter in J. I
> used a boolean array as the backing data structure. It seems to work ok:
> Bloom filters are nice data structures for holding lossy information about
> items you stick in them. You can't retrieve the items but you can perform a
> test to see if the item has been placed into the bloom filter previously.
> An interesting property is the imperfect nature of the test. It will return
> false positives sometimes, items that were not placed into the bloom filter
> but the test returns a positive result anyway.
> One use I know of is for spell checkers. You don't need to have an actual
> array / map of correctly spelled words - just a bloom filter that you can
> test new words against. Anyway, this is my implementation (the hash
> function is just a simple test function - probably better to use MD5 or
> murmur or something).
>
>
> NB. bloom filter
>
> filtersize=: 100
> keysize=:50
>
> NB. the bloom filter expressed as a boolean array
> BF=:filtersize $ 0
>
> NB. silly hash -- used for simple testing
> hash=:
> filtersize&|@:(29211&XOR)@:((3020&AND)+(1091&XOR))@:(+/)@:(a.&i.)@:":
>
> lrhash =: hash@:(":,":)
>
>
> NB. add an item to the bloom filter.
> NB. Create an array of size 'keysize'
> NB. and do multiple hashing for each
> NB. item. These hashes are the indices
> NB. to be inserted into the bloom filter.
> additem=: 3 : 0
> l=:(hash y),((<:keysize)$0)
> indices=:lrhash/\ l
> BF =:1 indices} BF
> )
>
>
> NB. Test if y is an item in the bloom filter.
> NB. Possibility of false positive.
> contains=: 3 : 0
> l=:(hash y),((<:keysize)$0)
> indices=:lrhash/\ l
> NB. if a zero exists then this item
> NB. was not added to the bloom filter.
> -.(0 e. indices{BF)
>
> )
>
>
> Add some items to the bloom filter, animal names.
>
> NB. Add some items ot the filter.
> additem&.>
> ('dog';'cat';'bird';'octopus';'horse';'cow';'mouse';'chicken';'salmon';'hamster';'wolf';'elephant';'pig';'sheep';'zebra')
>
> and we can test words:
>
> contains 'dog'
>  1
>
> contains 'human'
>  1
>
> Oh, false positive. 'human' isn't in there.
>
> contains 'Dog'
>  0
>
> Case sensitive, obviously.
>
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>



-- 

Devon McCormick, CFA

Quantitative Consultant
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to