Re: Use of MD5 as row keys - is this safe?

Michael Segel Mon, 23 Jul 2012 04:55:54 -0700

Ethan,

Its not a question of collision attacks, although that is an indication of the 
strength of the hash.


Just because both MD5 and SHA-1 yield a key of the same length, that doesn't 
mean that they will have the same chance of a collision. The algorithm has some 
impact on this too. 

I agree that if you're looking to hash the key, MD5 should be good enough. If 
you don't like it, SHA-1 or SHA-2 are also available as part of Sun's JDK. 
(See Appendix A here: 
http://docs.oracle.com/javase/6/docs/technotes/guides/security/p11guide.html) 

Key's longer than 512 are not exportable from the US. 

The point I was making was that if you are not comfortable in using MD5 and you 
want to append the key to the hash, you may want to consider a different hash. 

In terms of Uniqueness, take a UUID , hash it, and then truncate the hash to 
128 bits from 160 bits. This actually is a form of the UUID. While this could 
increase the likelihood of a collision, I haven't seen one in real use. 

The idea of creating a hash and then prepending it to the key is a tad 
pessimistic.

That was the point. 
 


On Jul 22, 2012, at 9:21 AM, Ethan Jewett wrote:

> To echo Joe Pallas:
> 
> Any fairly "random" hash algorithm producing the same length output should
> have about the same extremely small chance of producing the same output for
> two different inputs - a collision. It's a problem you need to be aware of
> no matter what hash algorithm you use. (Hash functions are mappings from a
> theoretically infinite input space to a finitely large output space, so
> they obviously generate the same output for multiple inputs.)
> 
> SHA-1 specifically (and MD5 even more-so) has an attack that shows that
> given a specific input and output, we can calculate a new input that
> produces the same output with better than brute-force efficiency.
> 
> Collisions and collision attacks are two different things. Collision
> attacks are a problem for cryptographic uses like signing, but how does
> this have anything to do with the problem of generating hBase row keys?
> Just use the fastest, most accessible, random-enough algorithm you can
> find, and if you are really worried about collisions then do something to
> ensure that the key will be unique. Right?
> 
> Cheers,
> Ethan
> 
> On Sun, Jul 22, 2012 at 2:00 PM, Michel Segel 
> <[email protected]>wrote:
> 
>> http://en.wikipedia.org/wiki/SHA-1
>> 
>> Check out the comparisons between the different SHA algos.
>> 
>> In theory a collision was found for SHA-1, but none found for SHA-2 does
>> that mean that a collision doesn't exist? No, it means that it hasn't
>> happened yet and the odds are that it won't be found. Possible? Yes,
>> however, highly improbable. You have a better chance of winning the lotto...
>> 
>> The point was that if you are going to hash your key,then concatenate the
>> initial key, you would be better off looking at the SHA-1 option. You have
>> to consider a couple of factors...
>> 1: availability of the algo. SHA-1 is in the standard java API and is
>> readily available.
>> 2: speed. Is SHA-1fast enough? Maybe, depending on your requirements. For
>> most, I'll say probably.
>> 3: Size of Key. SHA-1 is probably be smaller than having an MD-5 hash and
>> the original key added.
>> 
>> Just food for thought...
>> 
>> Sent from a remote device. Please excuse any typos...
>> 
>> Mike Segel
>> 
>> On Jul 20, 2012, at 3:35 PM, Joe Pallas <[email protected]> wrote:
>> 
>>> 
>>> On Jul 20, 2012, at 12:16 PM, Michel Segel wrote:
>>> 
>>>> I don't believe that there has been any reports of collisions, but if.
>> You are concerned you could use the SHA-1 for generating the hash.
>> Relatively speaking, SHA-1is slower, but still fast enough for most
>> applications.
>>> 
>>> Every hash function can have collisions, by definition.  If the
>> correctness of your design depends on collisions being impossible, rather
>> than very rare, then your design is faulty.
>>> 
>>> Cryptographic hash functions have the property that it is
>> computationally hard to create inputs that match a given output.  That
>> doesn’t in itself make cryptographic hash functions better than other hash
>> functions for avoiding hot-spotting.  (But it does usually make
>> cryptographic hash functions more expensive to compute than other hash
>> functions.)
>>> 
>>> You may want to look at <http://www.strchr.com/hash_functions>  and <
>> http://programmers.stackexchange.com/questions/49550/which-hashing-algorithm-is-best-for-uniqueness-and-speed/145633#145633
>>> .
>>> 
>>> Hope this helps,
>>> joe
>>> 
>>> 
>>

Re: Use of MD5 as row keys - is this safe?

Reply via email to