#715: Parents probably not reclaimed due to too much caching
-------------------------------------------------------------------+--------
       Reporter:  robertwb                                         |         
Owner:  somebody                     
           Type:  defect                                           |        
Status:  needs_work                   
       Priority:  major                                            |     
Milestone:  sage-pending                 
      Component:  coercion                                         |    
Resolution:                               
       Keywords:  weak cache coercion Cernay2012                   |   Work 
issues:                               
Report Upstream:  N/A                                              |     
Reviewers:  Jean-Pierre Flori, Simon King
        Authors:  Simon King, Jean-Pierre Flori                    |     Merged 
in:                               
   Dependencies:  #9138, #11900, #11599, to be merged with #11521  |      
Stopgaps:                               
-------------------------------------------------------------------+--------

Comment (by SimonKing):

 Replying to [comment:198 nbruin]:
 > When reviewing #12313 I observed a possible problem for slight leaking
 > ...
 > By all means, if you have a good argument why this is not necessary,
 revert to Positive Review (and I'd be interested in seeing the argument).

 Yes, it is a potential leak. The argument would be:

  * ''If'' all items indexed by a certain key triple are gone, we are left
 with three size_t and with one pointer to an empty list, that will not be
 collected; that's just a few bytes.
  * It is (I believe) quite likely that the same key triple will be used
 again. Hence, the few bytes will actually be used again.
  * As long as it is not noticeable in a practical computation, I am not
 sure if it is a good idea to slow deallocation down with a test "if
 len(L)==0".

 OK, that is not more than a heuristical argument. The patch would allow a
 (I believe) very small leak, for the sake of a (probably) very small
 speed-up.

 > '''unweakreffable keys'''
 >
 > Note that currently, any key that doesn't allow weakreffing, gets a
 (permanent, global) strong ref in `_refcache` in the value list, keyed by
 their `id`. That's worse than a normal `dict`. A possible solution is to
 have a `strongrefcache` on the `MonoDict` or `TripleDict` itself. Then at
 least the references disappear when the Dict itself goes.

 Hm. It is quite a long time ago that I wrote the code, so I need some time
 to reconstruct what I thought.

 The data of a `TripleDict` are stored in buckets. The buckets just provide
 memory locations of the keys. This is in order to make access to the data
 very fast: Otherwise, one would have to do special cases for keys that are
 weak-refable and those that are not. By consequence, the weak references
 (with callback function) to the keys need to be stored somewhere else: in
 _refcache. In that way, items whose keys got garbage collected can be
 removed from cache.

 But why did I put ''strong'' references in _refcache as well? Let
 (k1,k2,k3) be a key, and assume that k1 is not weak-refable. Assume
 further that no external reference to k1 is left, but there are external
 strong references to k2 and k3. If I would not store a strong reference to
 k1 in _refcache, then k1 would be garbage collected. Since we do not have
 a weak reference with callback for k1 and since k2 and k3 can not be
 collected, the item for (k1,k2,k3) remains in the `TripleDict`. Hence,
 when iterating over the items (and there is existing code that does
 iterate over the items!), we would meet a reference to k1 ''after'' it was
 garbage collected. That means a segfault occurs.

 In other words: If k2 and k3 are not collectable and k1 can not be weak-
 refed, then we must ensure that k1 stays alive. The solution is to keep a
 strong reference to k1 in _refcache.

 But now I wonder: Wouldn't it be better to have _refcache not as a global
 dictionary, but have a separate _refcache for each `TripleDict`, so that
 it gets collected if the `TripleDict` gets collected? Is that your
 suggestion?

 I think this would be worth trying.

 > You'd have to ensure that whenever an entry gets deleted from the
 `MonoDict` or the `TripleDict`, that any references in `strongrefcache` to
 relevant key components get removed too. Especially for `TripleDict`, this
 needs to happen in `TripleDictEraser` too, because if any weakreffable key
 component gets GCd, the whole entry gets removed, so strong refs to other
 key components should be released.

 As I have pointed out, it is important that weak or strong references are
 stored in _refcache. But perhaps the items in _refcache should be triples
 of weak or strong references? If I am not mistaken, if (k1,k2,k3) is a
 key, then it is uniquely determined by (id(k1),id(k2),id(k3)). We store
 weak references (if possible) that provide (id(k1),id(k2),id(k3)). Hence,
 the callback function of the weak reference can simply delete this entry.

 __Conclusion__

  * I will try if the `"if len(L)==0"` test leads to a slow-down
  * I will try to replace the global _refcache by a dictionary that is
 local to each `TripleDict`.
  * I will store the references provided by _refcache in a different form,
 so that they can more easily be deleted.

-- 
Ticket URL: <http://trac.sagemath.org/sage_trac/ticket/715#comment:199>
Sage <http://www.sagemath.org>
Sage: Creating a Viable Open Source Alternative to Magma, Maple, Mathematica, 
and MATLAB

-- 
You received this message because you are subscribed to the Google Groups 
"sage-trac" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/sage-trac?hl=en.

Reply via email to