Re: [Python-Dev] Changing the order of iteration over a dictionary

2012-01-20 Thread Nick Coghlan
On Fri, Jan 20, 2012 at 8:49 PM, Mark Shannon  wrote:
> So, don't be afraid to change that hash function :)

Changing it for 3.3 isn't really raising major concerns: the real
concern is with changing it in maintenance and security patches for
earlier releases. Security patches that may break production
applications aren't desirable, since it means admins have to weigh up
the risk of being affected by the security vulnerability against the
risk of breakage from the patch itself.

The collision counting approach was attractive because it looked like
it might offer a way out that was less likely to break deployed
systems. Unfortunately, I think the point Martin raised about just
opening a new (even more subtle) attack vector kills that idea dead.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Changing the order of iteration over a dictionary

2012-01-20 Thread Fred Drake
On Fri, Jan 20, 2012 at 5:49 AM, Mark Shannon  wrote:
> So, don't be afraid to change that hash function :)

Definitely.

The hash function *has* been changed in the past, and a lot of developers
were schooled in not relying on the iteration order.  That's a good thing,
as those developers now write tests of what's actually important rather
than relying on implementation details of the Python runtime.

A hash function that changes more often than during an occasional major
version update will encourage more developers to write better tests.  We
can think of it as an educational tool.


  -Fred

-- 
Fred L. Drake, Jr.    
"A person who won't read has no advantage over one who can't read."
   --Samuel Langhorne Clemens
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Changing the order of iteration over a dictionary

2012-01-20 Thread Mark Shannon

Hi,

One of the main sticking points over possible fixes for the 
hash-collision security issue seems to be a fear that changing the 
iteration order of a dictionary will break backwards compatibility.


The order of iteration has never been specified. In fact not only is it 
arbitrary, it cannot be determined from the contents of a dict alone; it 
may depend on the insertion order.


Changing a hash function is not the only change that will change the 
iteration order; any of the following will also do so:

* Changing the minimum size of a dict.
* Changing the load factor of a dict.
* Changing the resizing policy of a dict.
* Sharing of keys between dicts.

By treating iteration order as part of the API we are effectively ruling 
out ever making any improvements to the dict.


For example, my new dictionary implementation
https://bitbucket.org/markshannon/hotpy_new_dict/
reduces memory use by 47% for gcbench, and by about 20% for the 2to3 
benchmark, on my 32bit machine.

(Nice graphs: http://tinyurl.com/7qd2nnm http://tinyurl.com/6uqvl2x )

The new dict implementation (necessarily) changes the iteration order 
and will break code that relies on it.


If dict iteration order is to be treated as part of the API (and I think 
that is a very bad idea) then it should be documented, which will be 
difficult since it is barely deterministic.
This will also be a major problem for PyPy, Jython and IronPython, as 
they will have to reimplement their dicts.


So, don't be afraid to change that hash function :)

Cheers,
Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com