Hi, I know it is a commonly suggested to use an MD5 checksum to create a row key from some other identifier, such as a string or long. This is usually done to guard against hot-spotting and seems to work well.
My concern is that there no guard against collision when this is done - two different strings or longs could produce the same row-key. Although this is very unlikely, it is bothersome to consider this possibility for large systems. So what I usually do is concatenate the MD5 with the original identifier... MD5(id) + id which assures that the rowkey is both randomly distributed and unique. Is this necessary, or is it the common practice to just use the MD5 checksum itself? Thanks, Jon
