#13387: Improve MonoDict and TripleDict data structures
----------------------------------+-----------------------------------------
Reporter: nbruin | Owner: Nils Bruin
Type: enhancement | Status: needs_review
Priority: major | Milestone: sage-5.8
Component: memleak | Resolution:
Keywords: | Work issues:
Report Upstream: N/A | Reviewers:
Authors: Nils Bruin | Merged in:
Dependencies: #11521, #12313 | Stopgaps:
----------------------------------+-----------------------------------------
Comment (by nbruin):
Oh bother. Neither the iteration code nor the resize code is likely to be
safe. The resize code does memory allocations throughout its inner loop,
so callbacks (and bucket modifications!) can happen there. The loop is not
robust against that in a rather fundamental way: If a callback happens on
an entry that has already been copied over to the new buckets, but before
the new buckets have been commissioned, then the entry will get removed
from its old bucket (possibly causing mayhem), but the entry in the new
bucket will happily remain!
I'm not sure how to formulate resize otherwise. Should we trigger a
garbage collection, to ensure all collectable entries have already been
removed? If we start rearranging the buckets right after that, we
shouldn't be getting any callbacks. Unless GC isn't always completely
thorough. Can it ever be that a GC call issued right after another GC can
still find something? Given how difficult it can be to break cycles, I can
imagine this would happen. For instance, if a weakref callback causes
something else to become garbage, will that still be collected in the same
call?
Another approach might be to "lock" the dictionary so that callbacks get
queued rather than executed. When we unlock the dictionary we could
process the queue (or purge the dict so that we're sure any outstanding
callback has become irrelevant)
The iterator is less inherently unsafe. It's just that in between "yield"s
there is likely allocation activity of some sort, so GCs could modify the
dict.
In any case, I don't think the work on this ticket makes the situation
worse than it already is, which apparently is not so bad: resize
operations are very rare. It may well be they don't happen in the test
suite at all.
Furthermore, the problem only arises if a new bucket needs to be
''extended'' during copying. Since buckets are supposed to be really
small, this would be even rarer (python lists are initialized with room
for a certain number of entries, right?). We don't know how big the
buckets are going to have to be. Otherwise we could just preallocate the
buckets with sufficient length.
In fact, I wonder how robust Python's own WeakValueDict and WeakKeyDict
are in the face of these issues. Perhaps they are OK because their memory
allocations are (near) atomic.
That would be an option too: program the hash tables of TripleDict and
MonoDict via open addressing rather than the list-of-lists we have now.
I'm not the right person to make such a change, though. I'm sure
implementing good hash tables has all kinds of pitfalls I am not aware of.
'''Conclusion:''' This ticket should still be merged because it's likely
strictly an improvement over the status-quo, but there are some
fundamental design problems due to an extremely remote possibility for
unfortunate GC occurrences in the middle of dictionary update operations.
--
Ticket URL: <http://trac.sagemath.org/sage_trac/ticket/13387#comment:35>
Sage <http://www.sagemath.org>
Sage: Creating a Viable Open Source Alternative to Magma, Maple, Mathematica,
and MATLAB
--
You received this message because you are subscribed to the Google Groups
"sage-trac" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sage-trac?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.