On Sat, Dec 31, 2011 at 8:29 PM, Paul McMillan <p...@mcmillan.ws> wrote:
> > I'm not too concerned about a 3rd party being able to guess the random > seed > > -- this would require much more effort on their part, since they would > have > > to generate a new set of colliding keys each time they think they have > > guessed the hash > > This is incorrect. Once an attacker has guessed the random seed, any > operation which reveals the ordering of hashed objects can be used to > verify the answer. JSON responses would be ideal. In fact, an attacker > can do a brute-force attack of the random seed offline. Once they have > the seed, generating collisions is a fast process. > Still, it would represent an effort for the attacker of a much greater magnitude than the current attack. It's all a trade-off -- at some point it'll just be easier for the attacker to use some other vulnerability. Also the class of vulnerable servers would be greatly reduced. > The goal isn't perfection, but we need to do better than a simple > salt. Perhaps. > I propose we modify the string hash function like this: > > https://gist.github.com/0a91e52efa74f61858b5 > > This code is based on PyPy's implementation, but the concept is > universal. Rather than choosing a single short random seed per > process, we generate a much larger random seed (r). As we hash, we > deterministically choose a portion of that seed and incorporate it > into the hash process. This modification is a minimally intrusive > change to the existing hash function, and so should not introduce > unexpected side effects which might come from switching to a different > class of hash functions. > I'm not sure I understand this. What's the worry about "a different class of hash functions"? (It may be clear that I do not have a deep mathematical understanding of hash functions.) > I've worked through this code with Alex Gaynor, Antoine Pitrou, and > Victor Stinner, and have asked several mathematicians and security > experts to review the concept. The reviewers who have gotten back to > me thus far have agreed that if the initial random seed is not flawed, I forget -- what do we do on systems without urandom()? (E.g. Windows?) > this should not overly change the properties of the hash function, but > should make it quite difficult for an attacker to deduce the necessary > information to predictably cause hash collisions. This function is not > designed to protect against timing attacks, but should be nontrivial > to reverse even with access to timing data. > Let's worry about timing attacks another time okay? > Empirical testing shows that this unoptimized python implementation > produces ~10% slowdown in the hashing of ~20 character strings. This > is probably an acceptable trade off, and actually provides better > performance in the case of short strings than a high-entropy > fixed-length seed prefix. > Hm. I'm not sure I like the idea of extra arithmetic for every character being hashed. But I like the idea of a bigger random seed from which we deterministically pick some part. How about just initializing x to some subsequence of the seed determined by e.g. the length of the hashed string plus a few characters from it? -- --Guido van Rossum (python.org/~guido)
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com