daniel added a comment.

To reduce memory consumption of the approach I suggested above, use part (the 
first few digits) of the hash as the key in the "seen" array, keep the full 
hash as the value. For instance, using 4 digits would limit the size of the 
"seen" list to 2^16 entires.

When looking up x:

- `!isset( $seen[ key($x) ] )`  ->  not seen
- `isset( $seen[ key($x) ] ) && $seen[ key($x) ] === x`  ->  seen
- `isset( $seen[ key($x) ] ) && $seen[ key($x) ] !== x`  ->  probably not seen

Finally, set `$seen[ key($x) ] = x`

Hat Tip to 
http://www.somethingsimilar.com/2012/05/21/the-opposite-of-a-bloom-filter/ and 
https://news.ycombinator.com/item?id=4251313


TASK DETAIL
  https://phabricator.wikimedia.org/T92586

REPLY HANDLER ACTIONS
  Reply to comment or attach files, or !close, !claim, !unsubscribe or !assign 
<username>.

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Smalyshev, daniel
Cc: daniel, Manybubbles, Aklapper, Smalyshev, jkroll, Wikidata-bugs, Jdouglas, 
aude, GWicke, JanZerebecki



_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to