Re: Use of MD5 as row keys - is this safe?

Damien Hardy Fri, 20 Jul 2012 09:32:23 -0700

Le 20/07/2012 18:22, Jonathan Bishop a écrit :
> Hi,
>
> I know it is a commonly suggested to use an MD5 checksum to create a row
> key from some other identifier, such as a string or long. This is usually
> done to guard against hot-spotting and seems to work well.
>
> My concern is that there no guard against collision when this is done - two
> different strings or longs could produce the same row-key. Although this is
> very unlikely, it is bothersome to consider this possibility for large
> systems.
>
> So what I usually do is concatenate the MD5 with the original identifier...
>
> MD5(id) + id
>
> which assures that the rowkey is both randomly distributed and unique.
>
> Is this necessary, or is it the common practice to just use the MD5
> checksum itself?
>
> Thanks,
>
> Jon


Hello Jonathan,

md5(id)+id is the good way to avoid hotspotting and insure uniqueness.

md5(id)[0]+id could be an other way to limit randomness of the rowid on
16 values
You can now combine (with OR logic) 16 filters in a scanner (on for each
letter available in md5 digest)
it limits the balance on 16 potentials regions olso.

Cheers,

-- 
Damien

signature.asc
Description: OpenPGP digital signature

Re: Use of MD5 as row keys - is this safe?

Reply via email to