[sage-trac] Re: [Sage] #13387: Improve MonoDict and TripleDict data structures

Sage Thu, 21 Feb 2013 17:20:35 -0800

#13387: Improve MonoDict and TripleDict data structures
----------------------------------+-----------------------------------------
       Reporter:  nbruin          |         Owner:  Nils Bruin  
           Type:  enhancement     |        Status:  needs_review
       Priority:  major           |     Milestone:  sage-5.8    
      Component:  memleak         |    Resolution:              
       Keywords:                  |   Work issues:              
Report Upstream:  N/A             |     Reviewers:              
        Authors:  Nils Bruin      |     Merged in:              
   Dependencies:  #11521, #12313  |      Stopgaps:              
----------------------------------+-----------------------------------------


Comment (by nbruin):

 Oh bother. Neither the iteration code nor the resize code is likely to be
 safe. The resize code does memory allocations throughout its inner loop,
 so callbacks (and bucket modifications!) can happen there. The loop is not
 robust against that in a rather fundamental way: If a callback happens on
 an entry that has already been copied over to the new buckets, but before
 the new buckets have been commissioned, then the entry will get removed
 from its old bucket (possibly causing mayhem), but the entry in the new
 bucket will happily remain!

 I'm not sure how to formulate resize otherwise. Should we trigger a
 garbage collection, to ensure all collectable entries have already been
 removed? If we start rearranging the buckets right after that, we
 shouldn't be getting any callbacks. Unless GC isn't always completely
 thorough. Can it ever be that a GC call issued right after another GC can
 still find something? Given how difficult it can be to break cycles, I can
 imagine this would happen. For instance, if a weakref callback causes
 something else to become garbage, will that still be collected in the same
 call?

 Another approach might be to "lock" the dictionary so that callbacks get
 queued rather than executed. When we unlock the dictionary we could
 process the queue (or purge the dict so that we're sure any outstanding
 callback has become irrelevant)

 The iterator is less inherently unsafe. It's just that in between "yield"s
 there is likely allocation activity of some sort, so GCs could modify the
 dict.

 In any case, I don't think the work on this ticket makes the situation
 worse than it already is, which apparently is not so bad: resize
 operations are very rare. It may well be they don't happen in the test
 suite at all.

 Furthermore, the problem only arises if a new bucket needs to be
 ''extended'' during copying. Since buckets are supposed to be really
 small, this would be even rarer (python lists are initialized with room
 for a certain number of entries, right?). We don't know how big the
 buckets are going to have to be. Otherwise we could just preallocate the
 buckets with sufficient length.

 In fact, I wonder how robust Python's own WeakValueDict and WeakKeyDict
 are in the face of these issues. Perhaps they are OK because their memory
 allocations are (near) atomic.

 That would be an option too: program the hash tables of TripleDict and
 MonoDict via open addressing rather than the list-of-lists we have now.
 I'm not the right person to make such a change, though. I'm sure
 implementing good hash tables has all kinds of pitfalls I am not aware of.

 '''Conclusion:''' This ticket should still be merged because it's likely
 strictly an improvement over the status-quo, but there are some
 fundamental design problems due to an extremely remote possibility for
 unfortunate GC occurrences in the middle of dictionary update operations.

-- 
Ticket URL: <http://trac.sagemath.org/sage_trac/ticket/13387#comment:35>
Sage <http://www.sagemath.org>
Sage: Creating a Viable Open Source Alternative to Magma, Maple, Mathematica, 
and MATLAB

-- 
You received this message because you are subscribed to the Google Groups 
"sage-trac" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sage-trac?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

[sage-trac] Re: [Sage] #13387: Improve MonoDict and TripleDict data structures

Reply via email to