Re: [Python-Dev] Counting collisions for the win

Glenn Linderman Thu, 19 Jan 2012 21:27:09 -0800

On 1/19/2012 8:54 PM, Carl Meyer wrote:

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Hi Victor,

On 01/19/2012 05:48 PM, Victor Stinner wrote:
[snip]

Using a randomized hash may
also break (indirectly) real applications because the application
output is also somehow "randomized". For example, in the Django test
suite, the HTML output is different at each run. Web browsers may
render the web page differently, or crash, or ... I don't think that
Django would like to sort attributes of each HTML tag, just because we
wanted to fix a vulnerability.

I'm a Django core developer, and if it is true that our test-suite has a
dictionary-ordering dependency that is expressed via HTML attribute
ordering, I consider that a bug and would like to fix it. I'd be
grateful for, not resentful of, a change in CPython that revealed the
bug and prompted us to fix it. (I presume that it is true, as it sounds
like you experienced it directly; I don't have time to play around at
the moment, but I'm surprised we haven't seen bug reports about it from
users of 64-bit Pythons long ago). I can't speak for the core team, but
I doubt there would be much disagreement on this point: ideally Django
would run equally well on any implementation of Python, and as far as I
know none of the alternative implementations guarantee hash or
dict-ordering compatibility with CPython.

I don't have the expertise to speak otherwise to the alternatives for
fixing the collisions vulnerability, but I don't believe it's accurate
to presume that Django would not want to fix a dict-ordering dependency,
and use that as a justification for one approach over another.

Carl

It might be a good idea to have a way to seed the hash with some valueto allow testing with different dict orderings -- this would allow teststo be developed using one Python implementation that would be immune tothe different orderings on different implementations; however,randomizing the hash not only doesn't solve the problem for long-runningapplications, it causes non-deterministic performance from one run tothe next even with the exact same data: a different (random) seed couldcause collisions sporadically with data that usually gave goodperformance results, and there would be little explanation for it, andlittle way to reproduce the problem to report it or understand it.

_______________________________________________
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] Counting collisions for the win

Reply via email to