Selecting the correct data structure is key. The fastest is going to be a hash table with O(1). With 24 million 256 bit strings, your space requirements are about 800-900 megs of memory.
Try using a stl::hash_map<sha_bits, bool> // c++ HashMap< sha_bits, boolean> // java If you need to compress the space down you can use a trie http://en.wikipedia.org/wiki/Trie If you want quick and dirty you could just use memcached and still probably get the performance you want. I would avoid using a database if at all possible. Adam On Fri, Dec 27, 2013 at 10:55 AM, S. Dale Morrey <[email protected]>wrote: > I would love for you to tell me that, but still I'm trying to verify a > particular implementation of the algorithm, > > Your CDB file idea is a good one. I'm going to investigate it further. > > > > On Fri, Dec 27, 2013 at 10:50 AM, Steve Meyers <[email protected]> wrote: > > > On 12/27/13 10:43 AM, S. Dale Morrey wrote: > > > >> Yes, that is exactly what I'm doing. I'm checking the propensity for a > >> random string of characters to have a hash collision with an existing > >> known > >> set of words given an unsalted hashing algorithm. > >> > > > > Can I save you the trouble by telling you how unlikely it is to happen? > > > > Since your 24 million existing hashes are static, I'd look into how big a > > CDB file would be, and then use that to check for collisions. > > > > Steve > > > > > > > > /* > > PLUG: http://plug.org, #utah on irc.freenode.net > > Unsubscribe: http://plug.org/mailman/options/plug > > Don't fear the penguin. > > */ > > > > /* > PLUG: http://plug.org, #utah on irc.freenode.net > Unsubscribe: http://plug.org/mailman/options/plug > Don't fear the penguin. > */ > /* PLUG: http://plug.org, #utah on irc.freenode.net Unsubscribe: http://plug.org/mailman/options/plug Don't fear the penguin. */
