Good question. You could load the lot into a ruby Hash. Hash lookups are super fast.
You could also br Thank you, Nicholas Stewart On Fri, Dec 27, 2013 at 1:59 AM, S. Dale Morrey <[email protected]> wrote: > So here's the problem... > > I'm exploring the strength of the SHA256 algorithm. > Specifically I'm looking for the possibility of a hash collision. > > To that end I took a dictionary of common words and phrases and ran them > through the algorithm. > Now I've got a list with 24 million strings stored 1 to a line in a flat > text file. > The file is just shy of 1GB. Not too bad considering the dictionary I > borrowed was about 700MB. > > Now I want to check for collisions in random space. I have another process > generating other seemingly random strings and I want to check the hashes of > those random strings against this file in the shortest amount of time per > unit possible. > > I already used sort and now the hashes are in alphabetical order. > > So now I need to find a way to do the comparison as quickly as possible. > If the string is a match I need to store the new string and it's > initialization vector. > > I'm thinking grep would be good for this, but it seems to take a couple of > seconds to come back when searching a single item. I don't see any way to > have it read stdin and look for a list. > I'd like to do this with posix tools, but I'm thinking I may have to write > my own app to slurp it up into a table of some sort. A database is a > possibility I guess, but the latency seems like it might be higher than > some sort of in memory caching. > > Just wondering, what would be the fastest way to do this? > > /* > PLUG: http://plug.org, #utah on irc.freenode.net > Unsubscribe: http://plug.org/mailman/options/plug > Don't fear the penguin. > */ /* PLUG: http://plug.org, #utah on irc.freenode.net Unsubscribe: http://plug.org/mailman/options/plug Don't fear the penguin. */
