#13394: Write a WeakValueDictionary with safer key removal
-------------------------------------+-------------------------------------
Reporter: nbruin | Owner: rlm
Type: enhancement | Status: needs_review
Priority: major | Milestone: sage-5.13
Component: memleak | Resolution:
Keywords: | Merged in:
Authors: Simon King | Reviewers:
Report Upstream: None of the above | Work issues:
- read trac for reasoning. | Commit:
Branch: | e4adaebf2bd8a219f05b28746210cc0b0d0bba88
u/SimonKing/ticket/13394 | Stopgaps:
Dependencies: |
-------------------------------------+-------------------------------------
Comment (by SimonKing):
I use the following data for my benchmarks.
{{{
sage: import sage.misc.weak_dict
sage: import weakref
sage: L = [(p,GF(p)) for p in prime_range(10^5)]
sage: N = prime_range(10^6)
}}}
Note that this has no hash collisions. Below is an example that
has lots of hash collisions, on purpose.
__Creation of dictionaries__
{{{
sage: %timeit D = weakref.WeakValueDictionary()
100000 loops, best of 3: 3.39 us per loop
sage: %timeit D = sage.misc.weak_dict.WeakValueDictionary()
1000000 loops, best of 3: 1.65 us per loop
}}}
=> Sage is faster
__Initial assignments__
{{{
sage: %timeit D = weakref.WeakValueDictionary(L)
10 loops, best of 3: 35.7 ms per loop
sage: %timeit D = sage.misc.weak_dict.WeakValueDictionary(L)
10 loops, best of 3: 38.4 ms per loop
}}}
=> Python is faster.
Since the creation of the empty dictionary is faster in Sage, it seems
that
there could be room for improvement.
__Overriding assignments__
{{{
sage: D = weakref.WeakValueDictionary(L)
sage: %timeit for k,v in L: D[k]=v
10 loops, best of 3: 40.6 ms per loop
sage: D = sage.misc.weak_dict.WeakValueDictionary(L)
sage: %timeit for k,v in L: D[k]=v
10 loops, best of 3: 37.1 ms per loop
}}}
=> Sage is faster
This may be surprising, as the initial assignment was slower!
__Reading__
{{{
sage: D = weakref.WeakValueDictionary(L)
sage: %timeit for k,v in L: w=D[k]
100 loops, best of 3: 12.3 ms per loop
sage: %timeit for k in N: w=D.get(k)
1 loops, best of 3: 326 ms per loop
sage: D = sage.misc.weak_dict.WeakValueDictionary(L)
sage: %timeit for k,v in L: w=D[k]
100 loops, best of 3: 5.54 ms per loop
sage: %timeit for k in N: w=D.get(k)
10 loops, best of 3: 31.3 ms per loop
}}}
=> Sage is ''much'' faster
__Containment__
{{{
sage: D = weakref.WeakValueDictionary(L)
sage: %timeit for k in N: k in D
1 loops, best of 3: 347 ms per loop
sage: D = sage.misc.weak_dict.WeakValueDictionary(L)
sage: %timeit for k in N: k in D
100 loops, best of 3: 18.8 ms per loop
}}}
=> Sage is ''very much'' faster
__Item deletion__
{{{
sage: def test():
....: D = weakref.WeakValueDictionary(L)
....: for k,v in L: del D[k]
....:
sage: %timeit test()
10 loops, best of 3: 44.9 ms per loop
sage: def test():
....: D = sage.misc.weak_dict.WeakValueDictionary(L)
....: for k,v in L: del D[k]
....:
sage: %timeit test()
10 loops, best of 3: 47.5 ms per loop
}}}
=> Python is a bit faster
__Iteration__
{{{
sage: D = weakref.WeakValueDictionary(L)
sage: %timeit _ = list(D)
1000 loops, best of 3: 312 us per loop
sage: %timeit _ = list(D.itervalues())
100 loops, best of 3: 2.88 ms per loop
sage: %timeit _ = list(D.iteritems())
100 loops, best of 3: 4.9 ms per loop
sage: D = sage.misc.weak_dict.WeakValueDictionary(L)
sage: %timeit _ = list(D)
100 loops, best of 3: 2.67 ms per loop
sage: %timeit _ = list(D.itervalues())
100 loops, best of 3: 2.22 ms per loop
sage: %timeit _ = list(D.iteritems())
100 loops, best of 3: 3.82 ms per loop
}}}
=> Iteration over the keys is ''much'' faster in Python, there seems room
for
improvement. But the other iterations are slightly better in Sage
__Copying__
Note that deepcopy for Python's weak value dictionaries is not exactly
deep,
as only the keys, but not the values, are copied:
{{{
sage: class C(object): pass
sage: v = C()
sage: w = C()
sage: D = weakref.WeakValueDictionary()
sage: D[1] = v
sage: D[w] = ZZ
sage: E = deepcopy(D)
sage: w in E # keys are copied, which is good
False
sage: E[1] is D[1] # values aren't copied
True
}}}
But this is actually correct, since copied values would be immediately
garbage
collected! So, my implementation does alike and only copies the keys, not
the values.
Timings:
{{{
sage: D = weakref.WeakValueDictionary(L)
sage: %timeit E = copy(D)
10 loops, best of 3: 45.5 ms per loop
sage: %timeit E = deepcopy(D)
1 loops, best of 3: 453 ms per loop
sage: D = sage.misc.weak_dict.WeakValueDictionary(L)
sage: %timeit E = copy(D)
10 loops, best of 3: 46.1 ms per loop
sage: %timeit E = deepcopy(D)
1 loops, best of 3: 452 ms per loop
}}}
=> Python and Sage are compatible
__With hash collisions__
Here are the corresponding timings with hash collisions.
{{{
sage: class Keys(object):
....: def __init__(self, n):
....: self.n = n
....: def __hash__(self):
....: return self.n%5
....: def __cmp__(self, other):
....: c = cmp(type(self),type(other))
....: if c:
....: return c
....: return cmp(self.n, other.n)
....: def __repr__(self):
....: return "Key(%s)"%self.n
sage: L = [(Keys(n),Integers(n)) for n in range(4000)]
sage: N = [Keys(n) for n in range(8000)]
# initial assignments
sage: %timeit D = weakref.WeakValueDictionary(L)
1 loops, best of 3: 4.47 s per loop
sage: %timeit D = sage.misc.weak_dict.WeakValueDictionary(L) # Sage wins
1 loops, best of 3: 2.22 s per loop
# overriding
sage: Dp = weakref.WeakValueDictionary(L)
sage: Ds = sage.misc.weak_dict.WeakValueDictionary(L)
sage: %timeit for k,v in L: Dp[k]=v
1 loops, best of 3: 2.28 s per loop
sage: %timeit for k,v in L: Ds[k]=v # Sage slightly faster
1 loops, best of 3: 2.17 s per loop
# reading
sage: %timeit for k,v in L: w=Dp[k]
1 loops, best of 3: 2.33 s per loop
sage: %timeit for k,v in L: w=Ds[k] # Sage slightly faster
1 loops, best of 3: 2.17 s per loop
# containment
sage: %timeit for k in N: k in Dp
1 loops, best of 3: 6.91 s per loop
sage: %timeit for k in N: k in Ds # Sage slightly faster
1 loops, best of 3: 6.74 s per loop
# item deletion, using a different test
sage: def test(D,L):
....: for k,v in L:
....: del D[k]
....: D[k] = v
....:
sage: %timeit test(Dp, L)
1 loops, best of 3: 6.88 s per loop
sage: %timeit test(Ds, L) # Sage wins
1 loops, best of 3: 4.77 s per loop
# iteration
sage: %timeit _ = list(Dp) # Python wins
10000 loops, best of 3: 130 us per loop
sage: %timeit _ = list(Ds)
1000 loops, best of 3: 508 us per loop
sage: %timeit _ = list(Dp.itervalues())
1000 loops, best of 3: 1.15 ms per loop
sage: %timeit _ = list(Ds.itervalues()) # Sage wins
1000 loops, best of 3: 523 us per loop
sage: %timeit _ = list(Dp.iteritems())
1000 loops, best of 3: 1.89 ms per loop
sage: %timeit _ = list(Ds.iteritems())
1000 loops, best of 3: 885 us per loop # Sage wins
# copying
sage: %timeit E = copy(Dp)
1 loops, best of 3: 2.23 s per loop
sage: %timeit E = copy(Ds) # Sage slightly faster
1 loops, best of 3: 2.09 s per loop
sage: %timeit E = deepcopy(Dp)
1 loops, best of 3: 2.41 s per loop
sage: %timeit E = deepcopy(Ds) # Sage slightly faster
1 loops, best of 3: 2.36 s per loop
}}}
'''__Sanity checks__'''
Apparently we should expect the most severe problems in the case of hash
collisions. Hence, we use the above class `Keys`. We verify that the
results
are as expected. Moreover, we test this during iteration over the keys of
the dictionary.
{{{
sage: class Vals(object): pass
sage: L = [(Keys(n),Vals()) for n in range(4000)]
sage: D = sage.misc.weak_dict.WeakValueDictionary(L)
sage: for k in D.iterkeys():
....: del L[0]
....: assert len(L) == len(D)
....:
sage: len(D)
1310
}}}
It is of course not surprising that some items remain in the dict, since
some
are deleted before they are visited in the loop, others are deleted after
they
are visited in the loop. The remaining items coincide with what is left in
`L`:
{{{
sage: set(D.items()) == set(L)
True
}}}
Trying the same with Python's weak value dictionaries of course fails:
{{{
sage: L = [(Keys(n),Vals()) for n in range(4000)]
sage: D = weakref.WeakValueDictionary(L)
sage: for k in D.iterkeys():
....: del L[0]
....: assert len(L) == len(D)
....:
Traceback (most recent call last):
...
RuntimeError: dictionary changed size during iteration
}}}
At least the remaining items are correctly kept track of:
{{{
sage: set(D.items()) == set(L)
True
}}}
'''__Conclusion__'''
The timings show that with few exceptions the new implementation is faster
than the generic implementation in Python's weakref module. This is not
much
of a surprise, since weakref is written in pure Python, while my
implementation is in Cython.
According to the above timings, only the iteration over the keys of the
dictionary are considerably faster in the old implementation. This is not
much
of a surprise, since in this task the old implementation can fully rely on
iteration over a Python dict, whereas in the new implementation one needs
more
work. I will try to make this faster, though.
The new implementation behaves well in the case of hash collisions. This
actually is a surprise for me, and in order to be on the safe side, it
would
make sense to add some consistency tests.
Other than that, the new implementation is not slower but much safer than
the
old implementation. We should use it---unless the crash you are getting is
reproducible.
--
Ticket URL: <http://trac.sagemath.org/ticket/13394#comment:29>
Sage <http://www.sagemath.org>
Sage: Creating a Viable Open Source Alternative to Magma, Maple, Mathematica,
and MATLAB
--
You received this message because you are subscribed to the Google Groups
"sage-trac" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sage-trac.
For more options, visit https://groups.google.com/groups/opt_out.