Hi,

I know it is a commonly suggested to use an MD5 checksum to create a row
key from some other identifier, such as a string or long. This is usually
done to guard against hot-spotting and seems to work well.

My concern is that there no guard against collision when this is done - two
different strings or longs could produce the same row-key. Although this is
very unlikely, it is bothersome to consider this possibility for large
systems.

So what I usually do is concatenate the MD5 with the original identifier...

MD5(id) + id

which assures that the rowkey is both randomly distributed and unique.

Is this necessary, or is it the common practice to just use the MD5
checksum itself?

Thanks,

Jon

Reply via email to