So here's the problem... I'm exploring the strength of the SHA256 algorithm. Specifically I'm looking for the possibility of a hash collision.
To that end I took a dictionary of common words and phrases and ran them through the algorithm. Now I've got a list with 24 million strings stored 1 to a line in a flat text file. The file is just shy of 1GB. Not too bad considering the dictionary I borrowed was about 700MB. Now I want to check for collisions in random space. I have another process generating other seemingly random strings and I want to check the hashes of those random strings against this file in the shortest amount of time per unit possible. I already used sort and now the hashes are in alphabetical order. So now I need to find a way to do the comparison as quickly as possible. If the string is a match I need to store the new string and it's initialization vector. I'm thinking grep would be good for this, but it seems to take a couple of seconds to come back when searching a single item. I don't see any way to have it read stdin and look for a list. I'd like to do this with posix tools, but I'm thinking I may have to write my own app to slurp it up into a table of some sort. A database is a possibility I guess, but the latency seems like it might be higher than some sort of in memory caching. Just wondering, what would be the fastest way to do this? /* PLUG: http://plug.org, #utah on irc.freenode.net Unsubscribe: http://plug.org/mailman/options/plug Don't fear the penguin. */
