Le 20/07/2012 18:22, Jonathan Bishop a écrit : > Hi, > > I know it is a commonly suggested to use an MD5 checksum to create a row > key from some other identifier, such as a string or long. This is usually > done to guard against hot-spotting and seems to work well. > > My concern is that there no guard against collision when this is done - two > different strings or longs could produce the same row-key. Although this is > very unlikely, it is bothersome to consider this possibility for large > systems. > > So what I usually do is concatenate the MD5 with the original identifier... > > MD5(id) + id > > which assures that the rowkey is both randomly distributed and unique. > > Is this necessary, or is it the common practice to just use the MD5 > checksum itself? > > Thanks, > > Jon
Hello Jonathan, md5(id)+id is the good way to avoid hotspotting and insure uniqueness. md5(id)[0]+id could be an other way to limit randomness of the rowid on 16 values You can now combine (with OR logic) 16 filters in a scanner (on for each letter available in md5 digest) it limits the balance on 16 potentials regions olso. Cheers, -- Damien
signature.asc
Description: OpenPGP digital signature
