#12313: Fix yet another memory leak caused by caching of coercion data
--------------------------------------------------+-------------------------
Reporter: SimonKing | Owner:
Type: defect | Status:
needs_review
Priority: major | Milestone: sage-5.3
Component: memleak | Resolution:
Keywords: coercion weak dictionary | Work issues:
Report Upstream: N/A | Reviewers: Simon King,
Jean-Pierre Flori, John Perry
Authors: Simon King, Jean-Pierre Flori | Merged in:
Dependencies: #11521, #11599, #12969, #12215 | Stopgaps:
--------------------------------------------------+-------------------------
Comment (by nbruin):
OK! example implementation done. A couple of things I found:
- Some of your doctests for MonoDict test TripleDict!
- your iterators can recurse quite deeply if there are lots of empty
buckets in
a row. That can easily happen.
- What happens if we find a dead entry in setitem? How can we be sure the
pending callback for that doesn't happen on our new value? I think
explicitly
deleting the weakref should do the trick, but then we should ensure that
nobody
else gets a chance to keep those weakrefs alive! I've gone with that for
now.
An alternative is to let the callback check that the weakref stored in
the
relevant bucket is indeed a weakref and dead. Then at least we won't be
removing a new entry under a key that happens to have reused the previous
memory address. Given that our use is caching, it would only lead to
reduced
performance, not wrong results, by the way.
Timings: I've run the same tests as above:
{{{
sage: L = []
sage: for p in prime_range(10000):
....: L.append(GF(p)['x','y'])
....:
sage: import weakref
sage: from sage.structure.coerce_dict import MonoDict, MonoDictNoRefCache
sage: M = MonoDict(53)
sage: for i,K in enumerate(L):
....: M[K] = i
....:
sage: N = MonoDictNoRefCache(53)
sage: for i,K in enumerate(L):
....: N[K] = i
....:
sage:
sage: W = weakref.WeakKeyDictionary()
sage: for i,K in enumerate(L):
....: W[K] = i
....:
sage: K = GF(97)['x','y']
sage: K2 = GF(next_prime(p))['x','y']
sage: K in W
True
sage: K in M
True
sage: K in N
True
sage:
sage: K2 in W
False
sage: K2 in M
False
sage: K2 in N
False
sage:
sage: %timeit K in W
625 loops, best of 3: 37.2 µs per loop
sage: %timeit K in M
625 loops, best of 3: 146 ns per loop
sage: %timeit K in N
625 loops, best of 3: 125 ns per loop
sage:
sage: %timeit K2 in W
625 loops, best of 3: 739 ns per loop
sage: %timeit K2 in M
625 loops, best of 3: 477 ns per loop
sage: %timeit K2 in N
625 loops, best of 3: 141 ns per loop
sage:
sage: %timeit a = W[K]
625 loops, best of 3: 37.5 µs per loop
sage: %timeit a = M[K]
625 loops, best of 3: 165 ns per loop
sage: %timeit a = N[K]
625 loops, best of 3: 109 ns per loop
}}}
Indeed, dramatically better!
With the `UniqueRepresentation`:
{{{
sage: class A(UniqueRepresentation):
....: def __init__(self,p):
....: self.p = p
....:
sage: L = []
sage: for p in prime_range(10000):
....: L.append(A(p))
....:
sage: K = A(97)
sage: K2 = A(next_prime(p))
sage:
sage: import weakref
sage: from sage.structure.coerce_dict import MonoDict, MonoDictNoRefCache
sage: M = MonoDict(53)
sage: for i,K in enumerate(L):
....: M[K] = i
....:
sage: N = MonoDictNoRefCache(53)
sage: for i,K in enumerate(L):
....: N[K] = i
....:
sage: W = weakref.WeakKeyDictionary()
sage: for i,K in enumerate(L):
....: W[K] = i
....:
sage: K in W
True
sage: K in M
True
sage: K in N
True
sage:
sage: K2 in W
False
sage: K2 in M
False
sage: K2 in N
False
sage:
sage: %timeit K in W
625 loops, best of 3: 1.31 µs per loop
sage: %timeit K in M
625 loops, best of 3: 157 ns per loop
sage: %timeit K in N
625 loops, best of 3: 278 ns per loop
sage:
sage: %timeit K2 in W
625 loops, best of 3: 1.08 µs per loop
sage: %timeit K2 in M
625 loops, best of 3: 421 ns per loop
sage: %timeit K2 in N
625 loops, best of 3: 163 ns per loop
sage:
sage: %timeit a = W[K]
625 loops, best of 3: 1.27 µs per loop
sage: %timeit a = M[K]
625 loops, best of 3: 341 ns per loop
sage: %timeit a = N[K]
625 loops, best of 3: 241 ns per loop
}}}
Only the membership test for a key that's in there is faster for the
`MonoDict`
and I think I know why. The line
{{{
tmp = <object>PyList_GET_ITEM(bucket, i)
}}}
compiles to an incref/decref on bucket[i] (and similarly for r).
Since we're holding refs in the bucket there anyway, we don't need to do
the inc/dec dance for those.
In the `MonoDict` this is already optimized because the lookup is only
done in a normal `dict`, which is one of the most highly tuned pieces of
code in Python.
--
Ticket URL: <http://trac.sagemath.org/sage_trac/ticket/12313#comment:162>
Sage <http://www.sagemath.org>
Sage: Creating a Viable Open Source Alternative to Magma, Maple, Mathematica,
and MATLAB
--
You received this message because you are subscribed to the Google Groups
"sage-trac" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to
[email protected].
For more options, visit this group at
http://groups.google.com/group/sage-trac?hl=en.