Quick summary of what I found when I last ran experiments with this idea:

* To get the same lookup performance, the density of index table would need to 
go down to around 25%. Otherwise, there's no way to make up for the extra 
indirection and the loss of cache locality.

* There was a small win on iteration performance because its cheaper to loop 
over a dense array than a sparse array (fewer memory access and elimination of 
the unpredictable branch).  This is nice because iteration performance matters 
in some key use cases.

* I gave up on ordering right away.  If we care about performance, keys can be 
stored in the order added; but no effort should be expended to maintain order 
if subsequent deletions occur.  Likewise, to keep set-to-set operations 
efficient (i.e. looping over the smaller input), no order guarantee should be 
given for those operations.  In general, we can let order happen but should not 
guarantee it and work to maintain it or slow-down essential operations to make 
them ordered.

* Compacting does make sets a little smaller but does cost an indirection and 
incurs a cost for switching index sizes between 1-byte arrays, 2-byte arrays, 
4-byte arrays, and 8-byte arrays.  Those don't seem very expensive; however, 
set lookups are already very cheap when the hash values are known (when they're 
not, the cost of computing the hash value tends to dominate anything done by 
the setobject itself).

* I couldn't find any existing application that would notice the benefit of 
making sets a bit smaller.  Most applications use dictionaries (directly or 
indirectly) everywhere, so compaction was an overall win.  Sets tend to be used 
more sparsely (no pun intended) and tend to be only a small part of overall 
memory usage. I had to consider this when bumping the load factor down to 60%, 
prioritizing speed over space.


Raymond

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to