> On Sep 14, 2016, at 3:50 PM, Eric Snow <ericsnowcurren...@gmail.com> wrote:
> 
>> 
>> Then, I'll do same to sets.
> 
> Unless I've misunderstood, Raymond was opposed to making a similar
> change to set.

That's right.  Here are a few thoughts on the subject before people starting 
running wild.

* For the compact dict, the space savings was a net win with the additional 
space consumed by the indices and the overallocation for the key/value/hash 
arrays being more than offset by the improved density of key/value/hash arrays. 
  However for sets, the net was much less favorable because we still need the 
indices and overallocation but can only offset the space cost by densifying 
only two of the three arrays.  In other words, compacting makes more sense when 
you have wasted space for keys, values, and hashes.  If you lose one of those 
three, it stops being compelling.

* The use pattern for sets is different from dicts.  The former has more hit or 
miss lookups.  The latter tends to have fewer missing key lookups. Also, some 
of the optimizations for the set-to-set operations make it difficult to retain 
set ordering without impacting performance.

* I pursued alternative path to improve set performance.  Instead of compacting 
(which wasn't much of space win and incurred the cost of an additional 
indirection), I added linear probing to reduce the cost of collisions and 
improve cache performance.  This improvement is incompatible with the 
compacting approach I advocated for dictionaries.

* For now, the ordering side-effect on dictionaries is non-guaranteed, so it is 
premature to start insisting the sets become ordered as well.  The docs already 
link to a recipe for creating an OrderedSet ( 
https://code.activestate.com/recipes/576694/ ) but it seems like the uptake has 
been nearly zero.  Also, now that Eric Snow has given us a fast OrderedDict, it 
is easier than ever to 
 build an OrderedSet from MutableSet and OrderedDict, but again I haven't 
observed any real interest because typical set-to-set data analytics don't 
really need or care about ordering.  Likewise, the primary use of fast 
membership testings is order agnostic.

* That said, I do think there is room to add alternative set implementations to 
PyPI.  In particular, there are some interesting special cases for orderable 
data where set-to-set operations can be sped-up by comparing entire ranges of 
keys (see 
https://code.activestate.com/recipes/230113-implementation-of-sets-using-sorted-lists
 for a starting point).  IIRC, PyPI already has code for set-like bloom filters 
and cuckoo hashing.

* I understanding that it is exciting to have a major block of code accepted 
into the Python core but that shouldn't open to floodgates to engaging in more 
major rewrites of other datatypes unless we're sure that it is warranted.


Raymond Hettinger



_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to