New submission from Raymond Hettinger: This tracker item is for a thought experiment I'm running where I can collect the thoughts and discussions in one place. It is not an active proposal for inclusion in Python.
The idea is to greatly speed-up the language for set/dict lookups of unicode value by skipping the exact comparison when the unicode type is exact and the 64-bit hash values are known to match. Given the siphash and hash randomization, we get a 1 in 2**64 chance of a false positive (which is better than the error rate for non-ECC DRAM itself). However, since the siphash isn't cryptographically secure, presumably a malicious chooser of keys could generate a false positive on-purpose. This technique is currently used by git and mercurial which use hash values for file and version graphs without checking for an exact match (because the chance of a false positive is vanishingly rare). The Python test suite passes as does the test suites for a number of packages I have installed. ---------- assignee: rhettinger components: Interpreter Core files: assume_perf_uni_hash.diff keywords: patch messages: 238552 nosy: rhettinger priority: normal severity: normal status: open title: Experiment: Assume that exact unicode hashes are perfect discriminators type: performance versions: Python 3.5 Added file: http://bugs.python.org/file38565/assume_perf_uni_hash.diff _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue23712> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com