#12313: Fix yet another memory leak caused by caching of coercion data
--------------------------------------------------+-------------------------
       Reporter:  SimonKing                       |         Owner:              
                             
           Type:  defect                          |        Status:  
needs_review                             
       Priority:  major                           |     Milestone:  sage-5.3    
                             
      Component:  memleak                         |    Resolution:              
                             
       Keywords:  coercion weak dictionary        |   Work issues:              
                             
Report Upstream:  N/A                             |     Reviewers:  Simon King, 
Jean-Pierre Flori, John Perry
        Authors:  Simon King, Jean-Pierre Flori   |     Merged in:              
                             
   Dependencies:  #11521, #11599, #12969, #12215  |      Stopgaps:              
                             
--------------------------------------------------+-------------------------

Comment (by nbruin):

 OK! example implementation done. A couple of things I found:

  - Some of your doctests for MonoDict test TripleDict!

  - your iterators can recurse quite deeply if there are lots of empty
 buckets in
  a row. That can easily happen.

  - What happens if we find a dead entry in setitem? How can we be sure the
  pending callback for that doesn't happen on our new value? I think
 explicitly
  deleting the weakref should do the trick, but then we should ensure that
 nobody
  else gets a chance to keep those weakrefs alive! I've gone with that for
 now.
  An alternative is to let the callback check that the weakref stored in
 the
  relevant bucket is indeed a weakref and dead. Then at least we won't be
  removing a new entry under a key that happens to have reused the previous
  memory address. Given that our use is caching, it would only lead to
 reduced
  performance, not wrong results, by the way.

  Timings: I've run the same tests as above:
 {{{
 sage: L = []
 sage: for p in prime_range(10000):
 ....:         L.append(GF(p)['x','y'])
 ....:
 sage: import weakref
 sage: from sage.structure.coerce_dict import MonoDict, MonoDictNoRefCache
 sage: M = MonoDict(53)
 sage: for i,K in enumerate(L):
 ....:         M[K] = i
 ....:
 sage: N = MonoDictNoRefCache(53)
 sage: for i,K in enumerate(L):
 ....:         N[K] = i
 ....:
 sage:
 sage: W = weakref.WeakKeyDictionary()
 sage: for i,K in enumerate(L):
 ....:         W[K] = i
 ....:
 sage: K = GF(97)['x','y']
 sage: K2 = GF(next_prime(p))['x','y']
 sage: K in W
 True
 sage: K in M
 True
 sage: K in N
 True
 sage:
 sage: K2 in W
 False
 sage: K2 in M
 False
 sage: K2 in N
 False
 sage:
 sage: %timeit K in W
 625 loops, best of 3: 37.2 µs per loop
 sage: %timeit K in M
 625 loops, best of 3: 146 ns per loop
 sage: %timeit K in N
 625 loops, best of 3: 125 ns per loop
 sage:
 sage: %timeit K2 in W
 625 loops, best of 3: 739 ns per loop
 sage: %timeit K2 in M
 625 loops, best of 3: 477 ns per loop
 sage: %timeit K2 in N
 625 loops, best of 3: 141 ns per loop
 sage:
 sage: %timeit a = W[K]
 625 loops, best of 3: 37.5 µs per loop
 sage: %timeit a = M[K]
 625 loops, best of 3: 165 ns per loop
 sage: %timeit a = N[K]
 625 loops, best of 3: 109 ns per loop
 }}}
 Indeed, dramatically better!

 With the `UniqueRepresentation`:
 {{{
 sage: class A(UniqueRepresentation):
 ....:         def __init__(self,p):
 ....:             self.p = p
 ....:
 sage: L = []
 sage: for p in prime_range(10000):
 ....:         L.append(A(p))
 ....:
 sage: K = A(97)
 sage: K2 = A(next_prime(p))
 sage:
 sage: import weakref
 sage: from sage.structure.coerce_dict import MonoDict, MonoDictNoRefCache
 sage: M = MonoDict(53)
 sage: for i,K in enumerate(L):
 ....:         M[K] = i
 ....:
 sage: N = MonoDictNoRefCache(53)
 sage: for i,K in enumerate(L):
 ....:         N[K] = i
 ....:
 sage: W = weakref.WeakKeyDictionary()
 sage: for i,K in enumerate(L):
 ....:         W[K] = i
 ....:
 sage: K in W
 True
 sage: K in M
 True
 sage: K in N
 True
 sage:
 sage: K2 in W
 False
 sage: K2 in M
 False
 sage: K2 in N
 False
 sage:
 sage: %timeit K in W
 625 loops, best of 3: 1.31 µs per loop
 sage: %timeit K in M
 625 loops, best of 3: 157 ns per loop
 sage: %timeit K in N
 625 loops, best of 3: 278 ns per loop
 sage:
 sage: %timeit K2 in W
 625 loops, best of 3: 1.08 µs per loop
 sage: %timeit K2 in M
 625 loops, best of 3: 421 ns per loop
 sage: %timeit K2 in N
 625 loops, best of 3: 163 ns per loop
 sage:
 sage: %timeit a = W[K]
 625 loops, best of 3: 1.27 µs per loop
 sage: %timeit a = M[K]
 625 loops, best of 3: 341 ns per loop
 sage: %timeit a = N[K]
 625 loops, best of 3: 241 ns per loop
 }}}
 Only the membership test for a key that's in there is faster for the
 `MonoDict`
 and I think I know why. The line
 {{{
            tmp = <object>PyList_GET_ITEM(bucket, i)
 }}}
 compiles to an incref/decref on bucket[i] (and similarly for r).
 Since we're holding refs in the bucket there anyway, we don't need to do
 the inc/dec dance for those.

 In the `MonoDict` this is already optimized because the lookup is only
 done in a normal `dict`, which is one of the most highly tuned pieces of
 code in Python.

-- 
Ticket URL: <http://trac.sagemath.org/sage_trac/ticket/12313#comment:162>
Sage <http://www.sagemath.org>
Sage: Creating a Viable Open Source Alternative to Magma, Maple, Mathematica, 
and MATLAB

-- 
You received this message because you are subscribed to the Google Groups 
"sage-trac" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/sage-trac?hl=en.

Reply via email to