[issue23712] Experiment: Assume that exact unicode hashes are perfect discriminators

Raymond Hettinger Thu, 19 Mar 2015 12:58:23 -0700

New submission from Raymond Hettinger:

This tracker item is for a thought experiment I'm running where I can collect 
the thoughts and discussions in one place.  It is not an active proposal for 
inclusion in Python.


The idea is to greatly speed-up the language for set/dict lookups of unicode 
value by skipping the exact comparison when the unicode type is exact and the 
64-bit hash values are known to match.

Given the siphash and hash randomization, we get a 1 in 2**64 chance of a false 
positive (which is better than the error rate for non-ECC DRAM itself).  

However, since the siphash isn't cryptographically secure, presumably a 
malicious chooser of keys could generate a false positive on-purpose.

This technique is currently used by git and mercurial which use hash values for 
file and version graphs without checking for an exact match (because the chance 
of a false positive is vanishingly rare).

The Python test suite passes as does the test suites for a number of packages I 
have installed.

----------
assignee: rhettinger
components: Interpreter Core
files: assume_perf_uni_hash.diff
keywords: patch
messages: 238552
nosy: rhettinger
priority: normal
severity: normal
status: open
title: Experiment:  Assume that exact unicode hashes are perfect discriminators
type: performance
versions: Python 3.5
Added file: http://bugs.python.org/file38565/assume_perf_uni_hash.diff

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue23712>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue23712] Experiment: Assume that exact unicode hashes are perfect discriminators

Reply via email to