"Martin v. Löwis" <[EMAIL PROTECTED]> writes: > So: what are your input data, and what is the > distribution among them?
With good enough hash functions one shouldn't need to care about the input distribution. Basically functions like SHA can be used as extractors: http://en.wikipedia.org/wiki/Extractor If there's a concern that the input distribution is specially concocted to give nonuniform results with some known hash function, then use one unknown to the input provider, e.g. import hmac def hash(obj, key='some string unknown to the input source'): return int(hmac.HMAC(key,repr(obj)).hexdigest()[:4], 16) Anyway I don't have the impression that the OP is concerned with this type of issue. Otherwise s/he'd want much longer hashes than 16 bits. -- http://mail.python.org/mailman/listinfo/python-list