Re: When convert two sets with the same elements to lists, are the lists always going to be the same?

Terry Reedy Fri, 04 May 2012 21:32:27 -0700

Peng, I actually am thinking about it.

Underlying problem: while unordered means conceptually unordered as faras the collection goes, the items in the collection, if homogenousenough, may have a natural order, which users find hard to ignore. Evenif not comparable, an implementation such as CPython that uses linearsequential memory will impose some order. Even if the implementationuses unordered (holographic?) memory, order will be imposed to iterate,as when creating a serialized representation of the collection. Abstractobjects, concrete objects, and serialized representations are threedifferent things, but people tend to conflate them.

Order consistency issues: if the unordered collection is iterated, whencan one expect the order to be the same? Theoretically, essentiallynever, except that iterating dicts by keys, values, or key-value pairsis guaranteed to be consistent, which means that re-iterating has to beconsistent. I actually think the same might as well be true for sets,although there is no doc that says so.

If two collections are equal, should the iteration order be the same? Ithas always been true that if hash values collide, insertion ordermatters. However, a good hash function avoids hash collisions as much aspossible in practical use cases. Without doing something artificial, asI did with the example, collisions should be especially rare on 64-bitbuilds. If one collection has a series of additions and deletions sothat the underlying hash table has a different size than an equalcollection build just from insertions, then order will also be different.

If the same collection is built by insertion in the same order, but indifferent runs, bugfix versions, or language versions, will iterationorder by the same? Historically, it has been for CPython for about adecade, and people has come to depend on that constancy, in spite ofwarning not to. Even core developers have not been immune, as theCPython test suite has a few set or dict iteration order dependenciesuntil it was recently randomized.

Late last year, it became obvious that this constancy is a practicaldenial-of-service security hole. The option to randomize hashing foreach run was added to 2.6, 2.7, 3.1, and 3.2. Randomized hashing by runis part of 3.3. So some of the discussion above is obsolete. The exampleI gave only works for that one run, as hash('a') changes each run. Soiteration order now changes with each run in fact as well as in theory.

For the doc, the problem is what to say and where without beingrepetitous (and to get multiple people to agree ;-).


--
Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list

Re: When convert two sets with the same elements to lists, are the lists always going to be the same?

Reply via email to