To echo Joe Pallas: Any fairly "random" hash algorithm producing the same length output should have about the same extremely small chance of producing the same output for two different inputs - a collision. It's a problem you need to be aware of no matter what hash algorithm you use. (Hash functions are mappings from a theoretically infinite input space to a finitely large output space, so they obviously generate the same output for multiple inputs.)
SHA-1 specifically (and MD5 even more-so) has an attack that shows that given a specific input and output, we can calculate a new input that produces the same output with better than brute-force efficiency. Collisions and collision attacks are two different things. Collision attacks are a problem for cryptographic uses like signing, but how does this have anything to do with the problem of generating hBase row keys? Just use the fastest, most accessible, random-enough algorithm you can find, and if you are really worried about collisions then do something to ensure that the key will be unique. Right? Cheers, Ethan On Sun, Jul 22, 2012 at 2:00 PM, Michel Segel <[email protected]>wrote: > http://en.wikipedia.org/wiki/SHA-1 > > Check out the comparisons between the different SHA algos. > > In theory a collision was found for SHA-1, but none found for SHA-2 does > that mean that a collision doesn't exist? No, it means that it hasn't > happened yet and the odds are that it won't be found. Possible? Yes, > however, highly improbable. You have a better chance of winning the lotto... > > The point was that if you are going to hash your key,then concatenate the > initial key, you would be better off looking at the SHA-1 option. You have > to consider a couple of factors... > 1: availability of the algo. SHA-1 is in the standard java API and is > readily available. > 2: speed. Is SHA-1fast enough? Maybe, depending on your requirements. For > most, I'll say probably. > 3: Size of Key. SHA-1 is probably be smaller than having an MD-5 hash and > the original key added. > > Just food for thought... > > Sent from a remote device. Please excuse any typos... > > Mike Segel > > On Jul 20, 2012, at 3:35 PM, Joe Pallas <[email protected]> wrote: > > > > > On Jul 20, 2012, at 12:16 PM, Michel Segel wrote: > > > >> I don't believe that there has been any reports of collisions, but if. > You are concerned you could use the SHA-1 for generating the hash. > Relatively speaking, SHA-1is slower, but still fast enough for most > applications. > > > > Every hash function can have collisions, by definition. If the > correctness of your design depends on collisions being impossible, rather > than very rare, then your design is faulty. > > > > Cryptographic hash functions have the property that it is > computationally hard to create inputs that match a given output. That > doesn’t in itself make cryptographic hash functions better than other hash > functions for avoiding hot-spotting. (But it does usually make > cryptographic hash functions more expensive to compute than other hash > functions.) > > > > You may want to look at <http://www.strchr.com/hash_functions> and < > http://programmers.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed/145633#145633 > >. > > > > Hope this helps, > > joe > > > > >
