Tim Peters <t...@python.org> added the comment:

> Surprisingly, deleting a very large set takes much longer than creating it.

Luis, that's not surprising ;-)  When you create it, it's mostly the case that 
there's a vast chunk of raw memory from which many pieces are passed out in 
address order (to hold all the newly created Python objects).  Memory access is 
thus mostly sequential.  But when you delete it, that vast chunk of once-raw 
memory is visited in essentially random order (string hashes impose a 
pseudo-random order on where (pointers to) string objects are stored within a 
set's vector), defeating all the hardware features that greatly benefit from 
sequential access.

More precisely, the set's internal vector is visited sequentially during 
deletion, but the string objects the pointers point _at_ are all over the 
place.  Even if nothing is swapped to disk, it's likely that visiting a string 
object during deletion will miss on all cache levels, falling back to (much 
slower) RAM.  Note that all the string objects must be visited during set 
deletion, in order to decrement their reference counts.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue32846>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to