This actually what I've already mentioned - with rainbow tables kept in memory it could be really fast!
Marek 2014-06-12 9:25 GMT+02:00 Michael Cutler <mich...@tumra.com>: > Hi Nick, > > The great thing about any *unsalted* hashes is you can precompute them > ahead of time, then it is just a lookup to find the password which matches > the hash in seconds -- always makes for a more exciting demo than "come > back in a few hours". > > It is a no-brainer to write a generator function to create all possible > passwords from a charset like " > abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789", hash > them and store them to lookup later. It is however incredibly wasteful on > storage space. > > - all passwords from 1 to 9 letters long > - using the charset above = 13,759,005,997,841,642 passwords > - assuming 20 bytes to store the SHA-1 and up to 9 to store the password > equals approximately 375.4 Petabytes > > Thankfully there is already a more efficient/compact mechanism to achieve > this using Rainbow Tables <http://en.wikipedia.org/wiki/Rainbow_table> -- > better still, there is an active community of people who have already > precomputed many of these datasets already. The above dataset is readily > available to download and is just 864GB -- much more feasible. > > All you need to do then is write a rainbow-table lookup function in Spark > and leverage the precomputed files stored in HDFS. Done right you should > be able to achieve interactive (few second) lookups. > > Have fun! > > MC > > > *Michael Cutler* > Founder, CTO > > > * Mobile: +44 789 990 7847 Email: mich...@tumra.com <mich...@tumra.com> > Web: tumra.com > <http://tumra.com/?utm_source=signature&utm_medium=email> * > *Visit us at our offices in Chiswick Park <http://goo.gl/maps/abBxq>* > *Registered in England & Wales, 07916412. VAT No. 130595328 <130595328>* > > > This email and any files transmitted with it are confidential and may also > be privileged. It is intended only for the person to whom it is addressed. > If you have received this email in error, please inform the sender > immediately. > If you are not the intended recipient you must not use, disclose, copy, > print, distribute or rely on this email. > > > On 12 June 2014 01:24, Nick Chammas <nicholas.cham...@gmail.com> wrote: > >> Spark is obviously well-suited to crunching massive amounts of data. How >> about to crunch massive amounts of numbers? >> >> A few years ago I put together a little demo for some co-workers to >> demonstrate the dangers of using SHA1 >> <http://codahale.com/how-to-safely-store-a-password/> to hash and store >> passwords. Part of the demo included a live brute-forcing of hashes to show >> how SHA1's speed made it unsuitable for hashing passwords. >> >> I think it would be cool to redo the demo, but utilize the power of a >> cluster managed by Spark to crunch through hashes even faster. >> >> But how would you do that with Spark (if at all)? >> >> I'm guessing you would create an RDD that somehow defined the search >> space you're going to go through, and then partition it to divide the work >> up equally amongst the cluster's cores. Does that sound right? >> >> I wonder if others have already used Spark for computationally-intensive >> workloads like this, as opposed to just data-intensive ones. >> >> Nick >> >> >> ------------------------------ >> View this message in context: Using Spark to crack passwords >> <http://apache-spark-user-list.1001560.n3.nabble.com/Using-Spark-to-crack-passwords-tp7437.html> >> Sent from the Apache Spark User List mailing list archive >> <http://apache-spark-user-list.1001560.n3.nabble.com/> at Nabble.com. >> > >