Ethan, Its not a question of collision attacks, although that is an indication of the strength of the hash.
Just because both MD5 and SHA-1 yield a key of the same length, that doesn't mean that they will have the same chance of a collision. The algorithm has some impact on this too. I agree that if you're looking to hash the key, MD5 should be good enough. If you don't like it, SHA-1 or SHA-2 are also available as part of Sun's JDK. (See Appendix A here: http://docs.oracle.com/javase/6/docs/technotes/guides/security/p11guide.html) Key's longer than 512 are not exportable from the US. The point I was making was that if you are not comfortable in using MD5 and you want to append the key to the hash, you may want to consider a different hash. In terms of Uniqueness, take a UUID , hash it, and then truncate the hash to 128 bits from 160 bits. This actually is a form of the UUID. While this could increase the likelihood of a collision, I haven't seen one in real use. The idea of creating a hash and then prepending it to the key is a tad pessimistic. That was the point. On Jul 22, 2012, at 9:21 AM, Ethan Jewett wrote: > To echo Joe Pallas: > > Any fairly "random" hash algorithm producing the same length output should > have about the same extremely small chance of producing the same output for > two different inputs - a collision. It's a problem you need to be aware of > no matter what hash algorithm you use. (Hash functions are mappings from a > theoretically infinite input space to a finitely large output space, so > they obviously generate the same output for multiple inputs.) > > SHA-1 specifically (and MD5 even more-so) has an attack that shows that > given a specific input and output, we can calculate a new input that > produces the same output with better than brute-force efficiency. > > Collisions and collision attacks are two different things. Collision > attacks are a problem for cryptographic uses like signing, but how does > this have anything to do with the problem of generating hBase row keys? > Just use the fastest, most accessible, random-enough algorithm you can > find, and if you are really worried about collisions then do something to > ensure that the key will be unique. Right? > > Cheers, > Ethan > > On Sun, Jul 22, 2012 at 2:00 PM, Michel Segel > <[email protected]>wrote: > >> http://en.wikipedia.org/wiki/SHA-1 >> >> Check out the comparisons between the different SHA algos. >> >> In theory a collision was found for SHA-1, but none found for SHA-2 does >> that mean that a collision doesn't exist? No, it means that it hasn't >> happened yet and the odds are that it won't be found. Possible? Yes, >> however, highly improbable. You have a better chance of winning the lotto... >> >> The point was that if you are going to hash your key,then concatenate the >> initial key, you would be better off looking at the SHA-1 option. You have >> to consider a couple of factors... >> 1: availability of the algo. SHA-1 is in the standard java API and is >> readily available. >> 2: speed. Is SHA-1fast enough? Maybe, depending on your requirements. For >> most, I'll say probably. >> 3: Size of Key. SHA-1 is probably be smaller than having an MD-5 hash and >> the original key added. >> >> Just food for thought... >> >> Sent from a remote device. Please excuse any typos... >> >> Mike Segel >> >> On Jul 20, 2012, at 3:35 PM, Joe Pallas <[email protected]> wrote: >> >>> >>> On Jul 20, 2012, at 12:16 PM, Michel Segel wrote: >>> >>>> I don't believe that there has been any reports of collisions, but if. >> You are concerned you could use the SHA-1 for generating the hash. >> Relatively speaking, SHA-1is slower, but still fast enough for most >> applications. >>> >>> Every hash function can have collisions, by definition. If the >> correctness of your design depends on collisions being impossible, rather >> than very rare, then your design is faulty. >>> >>> Cryptographic hash functions have the property that it is >> computationally hard to create inputs that match a given output. That >> doesn’t in itself make cryptographic hash functions better than other hash >> functions for avoiding hot-spotting. (But it does usually make >> cryptographic hash functions more expensive to compute than other hash >> functions.) >>> >>> You may want to look at <http://www.strchr.com/hash_functions> and < >> http://programmers.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed/145633#145633 >>> . >>> >>> Hope this helps, >>> joe >>> >>> >>
