24 Million entries and I need to what?

S. Dale Morrey Fri, 27 Dec 2013 00:59:47 -0800

So here's the problem...

I'm exploring the strength of the SHA256 algorithm.
Specifically I'm looking for the possibility of a hash collision.


To that end I took a dictionary of common words and phrases and ran them
through the algorithm.
Now I've got a list with 24 million strings stored 1 to a line in a flat
text file.
The file is just shy of 1GB.  Not too bad considering the dictionary I
borrowed was about 700MB.

Now I want to check for collisions in random space.  I have another process
generating other seemingly random strings and I want to check the hashes of
those random strings against this file in the shortest amount of time per
unit possible.

I already used sort and now the hashes are in alphabetical order.

So now I need to find a way to do the comparison as quickly as possible.
If the string is a match I need to store the new string and it's
initialization vector.

I'm thinking grep would be good for this, but it seems to take a couple of
seconds to come back when searching a single item.  I don't see any way to
have it read stdin and look for a list.
I'd like to do this with posix tools, but I'm thinking I may have to write
my own app to slurp it up into a table of some sort.  A database is a
possibility I guess, but the latency seems like it might be higher than
some sort of in memory caching.

Just wondering, what would be the fastest way to do this?

/*
PLUG: http://plug.org, #utah on irc.freenode.net
Unsubscribe: http://plug.org/mailman/options/plug
Don't fear the penguin.
*/

24 Million entries and I need to what?

Reply via email to