#12313: Fix yet another memory leak caused by caching of coercion data
--------------------------------------------------------+-------------------
       Reporter:  SimonKing                             |         Owner:        
                                               
           Type:  defect                                |        Status:  new   
                                               
       Priority:  major                                 |     Milestone:  
sage-5.8                                             
      Component:  memleak                               |    Resolution:        
                                               
       Keywords:  coercion weak dictionary              |   Work issues:        
                                               
Report Upstream:  N/A                                   |     Reviewers:  Simon 
King, Jean-Pierre Flori, John Perry, Nils Bruin
        Authors:  Simon King, Jean-Pierre Flori         |     Merged in:        
                                               
   Dependencies:  #715, #11521, #12215, #13746, #13378  |      Stopgaps:        
                                               
--------------------------------------------------------+-------------------

Comment (by nbruin):

 I am not able to reproduce this error, however "coerce_dict.pyx", line
 1079 is
 {{{
 # Test whether the old references are still active
 r1,r2,r3 = <tuple>(self._refcache[h1,h2,h3])
 }}}
 the line "homset.py", line 266:
 {{{
     _cache[key] = KeyedRef(H, _cache.eraser, (id(X),id(Y),id(category)))
 }}}
 was introduced in #11521, the
 
[http://trac.sagemath.org/sage_trac/attachment/ticket/11521/trac_11521_callback.patch
 trac_11521_callback.patch], to be precise.

 I'm a little worried about this implementation, because it's relying on
 implementation details of a tripledict that aren't really part of the API
 (in particular, here it's relying on the `TripleDict.eraser` attribute and
 its call convention). I think this should be refactored to a method
 {{{
 _cache.set_weakvalued(key, H)
 }}}
 so that the tripledict can figure out how to put a properly formatted
 callback on the value.

 A quick "solution" might be to merge #13387 as well, since that gets rid
 of `self._refcache`. However, I'm a little puzzled how this error can
 occur:

  - By the time we arrive at line 1079 we have succeeded in finding the key
 triple `(h1,h2,h3)`.
  - The KeyError indicates that `_refcache` doesn't contain the key
 anymore.
  - But key removal from `_refcache` and from `bucket` are inextricably
 linked! This means that a key removal was triggered somewhere between
 finding the element in the bucket and looking up in `_refcache`. But that
 should mean that `bucket` got modified too! That's really worrisome,
 because at this point we're ready to scribble in `bucket` at a location
 we've counted before.

 Indeed, if we are allocating memory anywhere in the loop that walks
 through the bucket we could be triggering a garbage collection, which
 could indeed include callbacks that modify this very bucket. I don't think
 any of the lines here could do that, though, except for
 `(self._refcache[h1,h2,h3])` itself, if that needs to construct a tuple or
 something like that.

 So my guess is: The entry that's keyed `(17671424, 82669344, 20657216)` is
 indeed liable to get a callback from a garbage collection and thus be
 removed, but that hasn't happened yet. However, in preparing the call
 `self._refcache[h1,h2,h3]` the system needs to make a python tuple
 (h1,h2,h3) to pass to `__getitem__` and that triggers the very GC that
 deletes the entry from `bucket` and `_refcache`; hence the key error.

 In other words, we need to make absolutely sure that in this lookup loop,
 no memory allocations whatsoever are done, because any of those could
 trigger a GC that can corrupt our view of what the bucket should look
 like.

 Looking at the lookup loop at #13387 for this, I think no memory
 allocations are made there, so if my diagnosis is correct, then merging
 that ticket would in fact be a proper fix.

 I still think we shouldn't be making `KeyedRef` instances using the
 `eraser` of a `TripleDict` outside the `TripleDict` itself, though.

 Looking at the code cython generates I think my diagnosis is a
 possibility:
 {{{
  *                         # Test whether the old references are still
 active
  *                         r1,r2,r3 = <tuple>(self._refcache[h1,h2,h3])
 # <<<<<<<<<<<<<<
  *                         if (isinstance(r1,KeyedRef) and r1() is None)
 or \
  *                            (isinstance(r2,KeyedRef) and r2() is None)
 or \
  */
 ...
           __pyx_t_9 = PyTuple_New(3); if (unlikely(!__pyx_t_9))
           __Pyx_GOTREF(__pyx_t_9);
           PyTuple_SET_ITEM(__pyx_t_9, 0, __pyx_t_1);
           __Pyx_GIVEREF(__pyx_t_1);
           PyTuple_SET_ITEM(__pyx_t_9, 1, __pyx_t_5);
           __Pyx_GIVEREF(__pyx_t_5);
           PyTuple_SET_ITEM(__pyx_t_9, 2, __pyx_t_8);
           __Pyx_GIVEREF(__pyx_t_8);
 ...
           __pyx_t_8 = __Pyx_PyDict_GetItem(((PyObject
 *)__pyx_v_self->__pyx_base._refcache), ((PyObject *)__pyx_t_9)); if
 (!__pyx_t_8) {__pyx_filename = __pyx_f[0]; __pyx_lineno = 1079;
 __pyx_clineno = __LINE__; goto __pyx_L1_error;}
 ...
 }}}
 As you see the lookup code does start with a `PyTuple_New` that could
 trigger a GC leading to removal of the key that gets looked up in the last
 line (where the `KeyError` would originate).

 '''TL;DR:''' merge #13387

-- 
Ticket URL: <http://trac.sagemath.org/sage_trac/ticket/12313#comment:318>
Sage <http://www.sagemath.org>
Sage: Creating a Viable Open Source Alternative to Magma, Maple, Mathematica, 
and MATLAB

-- 
You received this message because you are subscribed to the Google Groups 
"sage-trac" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sage-trac?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.


Reply via email to