#13394: Write a WeakValueDictionary with safer key removal
-------------------------------------+-------------------------------------
       Reporter:  nbruin             |        Owner:  rlm
           Type:  enhancement        |       Status:  needs_review
       Priority:  major              |    Milestone:  sage-5.13
      Component:  memleak            |   Resolution:
       Keywords:                     |    Merged in:
        Authors:  Simon King         |    Reviewers:
Report Upstream:  None of the above  |  Work issues:
  - read trac for reasoning.         |       Commit:
         Branch:                     |  e4adaebf2bd8a219f05b28746210cc0b0d0bba88
  u/SimonKing/ticket/13394           |     Stopgaps:
   Dependencies:                     |
-------------------------------------+-------------------------------------

Comment (by SimonKing):

 I use the following data for my benchmarks.
 {{{
 sage: import sage.misc.weak_dict
 sage: import weakref
 sage: L = [(p,GF(p)) for p in prime_range(10^5)]
 sage: N = prime_range(10^6)
 }}}

 Note that this has no hash collisions. Below is an example that
 has lots of hash collisions, on purpose.

 __Creation of dictionaries__

 {{{
 sage: %timeit D = weakref.WeakValueDictionary()
 100000 loops, best of 3: 3.39 us per loop
 sage: %timeit D = sage.misc.weak_dict.WeakValueDictionary()
 1000000 loops, best of 3: 1.65 us per loop
 }}}
 => Sage is faster

 __Initial assignments__

 {{{
 sage: %timeit D = weakref.WeakValueDictionary(L)
 10 loops, best of 3: 35.7 ms per loop
 sage: %timeit D = sage.misc.weak_dict.WeakValueDictionary(L)
 10 loops, best of 3: 38.4 ms per loop
 }}}
 => Python is faster.

 Since the creation of the empty dictionary is faster in Sage, it seems
 that
 there could be room for improvement.

 __Overriding assignments__

 {{{
 sage: D = weakref.WeakValueDictionary(L)
 sage: %timeit for k,v in L: D[k]=v
 10 loops, best of 3: 40.6 ms per loop
 sage: D = sage.misc.weak_dict.WeakValueDictionary(L)
 sage: %timeit for k,v in L: D[k]=v
 10 loops, best of 3: 37.1 ms per loop
 }}}
 => Sage is faster

 This may be surprising, as the initial assignment was slower!

 __Reading__

 {{{
 sage: D = weakref.WeakValueDictionary(L)
 sage: %timeit for k,v in L: w=D[k]
 100 loops, best of 3: 12.3 ms per loop
 sage: %timeit for k in N: w=D.get(k)
 1 loops, best of 3: 326 ms per loop
 sage: D = sage.misc.weak_dict.WeakValueDictionary(L)
 sage: %timeit for k,v in L: w=D[k]
 100 loops, best of 3: 5.54 ms per loop
 sage: %timeit for k in N: w=D.get(k)
 10 loops, best of 3: 31.3 ms per loop
 }}}
 => Sage is ''much'' faster

 __Containment__

 {{{
 sage: D = weakref.WeakValueDictionary(L)
 sage: %timeit for k in N: k in D
 1 loops, best of 3: 347 ms per loop
 sage: D = sage.misc.weak_dict.WeakValueDictionary(L)
 sage: %timeit for k in N: k in D
 100 loops, best of 3: 18.8 ms per loop
 }}}
 => Sage is ''very much'' faster

 __Item deletion__

 {{{
 sage: def test():
 ....:     D = weakref.WeakValueDictionary(L)
 ....:     for k,v in L: del D[k]
 ....:
 sage: %timeit test()
 10 loops, best of 3: 44.9 ms per loop
 sage: def test():
 ....:     D = sage.misc.weak_dict.WeakValueDictionary(L)
 ....:     for k,v in L: del D[k]
 ....:
 sage: %timeit test()
 10 loops, best of 3: 47.5 ms per loop
 }}}
 => Python is a bit faster

 __Iteration__

 {{{
 sage: D = weakref.WeakValueDictionary(L)
 sage: %timeit _ = list(D)
 1000 loops, best of 3: 312 us per loop
 sage: %timeit _ = list(D.itervalues())
 100 loops, best of 3: 2.88 ms per loop
 sage: %timeit _ = list(D.iteritems())
 100 loops, best of 3: 4.9 ms per loop
 sage: D = sage.misc.weak_dict.WeakValueDictionary(L)
 sage: %timeit _ = list(D)
 100 loops, best of 3: 2.67 ms per loop
 sage: %timeit _ = list(D.itervalues())
 100 loops, best of 3: 2.22 ms per loop
 sage: %timeit _ = list(D.iteritems())
 100 loops, best of 3: 3.82 ms per loop
 }}}
 => Iteration over the keys is ''much'' faster in Python, there seems room
 for
 improvement. But the other iterations are slightly better in Sage

 __Copying__

 Note that deepcopy for Python's weak value dictionaries is not exactly
 deep,
 as only the keys, but not the values, are copied:
 {{{
 sage: class C(object): pass
 sage: v = C()
 sage: w = C()
 sage: D = weakref.WeakValueDictionary()
 sage: D[1] = v
 sage: D[w] = ZZ
 sage: E = deepcopy(D)
 sage: w in E  # keys are copied, which is good
 False
 sage: E[1] is D[1] # values aren't copied
 True
 }}}
 But this is actually correct, since copied values would be immediately
 garbage
 collected! So, my implementation does alike and only copies the keys, not
 the values.

 Timings:
 {{{
 sage: D = weakref.WeakValueDictionary(L)
 sage: %timeit E = copy(D)
 10 loops, best of 3: 45.5 ms per loop
 sage: %timeit E = deepcopy(D)
 1 loops, best of 3: 453 ms per loop
 sage: D = sage.misc.weak_dict.WeakValueDictionary(L)
 sage: %timeit E = copy(D)
 10 loops, best of 3: 46.1 ms per loop
 sage: %timeit E = deepcopy(D)
 1 loops, best of 3: 452 ms per loop
 }}}
 => Python and Sage are compatible

 __With hash collisions__

 Here are the corresponding timings with hash collisions.
 {{{
 sage: class Keys(object):
 ....:     def __init__(self, n):
 ....:         self.n = n
 ....:     def __hash__(self):
 ....:         return self.n%5
 ....:     def __cmp__(self, other):
 ....:         c = cmp(type(self),type(other))
 ....:         if c:
 ....:             return c
 ....:         return cmp(self.n, other.n)
 ....:     def __repr__(self):
 ....:         return "Key(%s)"%self.n
 sage: L = [(Keys(n),Integers(n)) for n in range(4000)]
 sage: N = [Keys(n) for n in range(8000)]
 # initial assignments
 sage: %timeit D = weakref.WeakValueDictionary(L)
 1 loops, best of 3: 4.47 s per loop
 sage: %timeit D = sage.misc.weak_dict.WeakValueDictionary(L)  # Sage wins
 1 loops, best of 3: 2.22 s per loop

 # overriding
 sage: Dp = weakref.WeakValueDictionary(L)
 sage: Ds = sage.misc.weak_dict.WeakValueDictionary(L)
 sage: %timeit for k,v in L: Dp[k]=v
 1 loops, best of 3: 2.28 s per loop
 sage: %timeit for k,v in L: Ds[k]=v      # Sage slightly faster
 1 loops, best of 3: 2.17 s per loop

 # reading
 sage: %timeit for k,v in L: w=Dp[k]
 1 loops, best of 3: 2.33 s per loop
 sage: %timeit for k,v in L: w=Ds[k]      # Sage slightly faster
 1 loops, best of 3: 2.17 s per loop

 # containment
 sage: %timeit for k in N: k in Dp
 1 loops, best of 3: 6.91 s per loop
 sage: %timeit for k in N: k in Ds        # Sage slightly faster
 1 loops, best of 3: 6.74 s per loop

 # item deletion, using a different test
 sage: def test(D,L):
 ....:     for k,v in L:
 ....:         del D[k]
 ....:         D[k] = v
 ....:
 sage: %timeit test(Dp, L)
 1 loops, best of 3: 6.88 s per loop
 sage: %timeit test(Ds, L)                # Sage wins
 1 loops, best of 3: 4.77 s per loop

 # iteration
 sage: %timeit _ = list(Dp)               # Python wins
 10000 loops, best of 3: 130 us per loop
 sage: %timeit _ = list(Ds)
 1000 loops, best of 3: 508 us per loop
 sage: %timeit _ = list(Dp.itervalues())
 1000 loops, best of 3: 1.15 ms per loop
 sage: %timeit _ = list(Ds.itervalues())  # Sage wins
 1000 loops, best of 3: 523 us per loop
 sage: %timeit _ = list(Dp.iteritems())
 1000 loops, best of 3: 1.89 ms per loop
 sage: %timeit _ = list(Ds.iteritems())
 1000 loops, best of 3: 885 us per loop   # Sage wins

 # copying
 sage: %timeit E = copy(Dp)
 1 loops, best of 3: 2.23 s per loop
 sage: %timeit E = copy(Ds)               # Sage slightly faster
 1 loops, best of 3: 2.09 s per loop
 sage: %timeit E = deepcopy(Dp)
 1 loops, best of 3: 2.41 s per loop
 sage: %timeit E = deepcopy(Ds)           # Sage slightly faster
 1 loops, best of 3: 2.36 s per loop
 }}}

 '''__Sanity checks__'''

 Apparently we should expect the most severe problems in the case of hash
 collisions. Hence, we use the above class `Keys`. We verify that the
 results
 are as expected. Moreover, we test this during iteration over the keys of
 the dictionary.
 {{{
 sage: class Vals(object): pass
 sage: L = [(Keys(n),Vals()) for n in range(4000)]
 sage: D = sage.misc.weak_dict.WeakValueDictionary(L)
 sage: for k in D.iterkeys():
 ....:     del L[0]
 ....:     assert len(L) == len(D)
 ....:
 sage: len(D)
 1310
 }}}
 It is of course not surprising that some items remain in the dict, since
 some
 are deleted before they are visited in the loop, others are deleted after
 they
 are visited in the loop. The remaining items coincide with what is left in
 `L`:
 {{{
 sage: set(D.items()) == set(L)
 True
 }}}

 Trying the same with Python's weak value dictionaries of course fails:
 {{{
 sage: L = [(Keys(n),Vals()) for n in range(4000)]
 sage: D = weakref.WeakValueDictionary(L)
 sage: for k in D.iterkeys():
 ....:     del L[0]
 ....:     assert len(L) == len(D)
 ....:
 Traceback (most recent call last):
 ...
 RuntimeError: dictionary changed size during iteration
 }}}
 At least the remaining items are correctly kept track of:
 {{{
 sage: set(D.items()) == set(L)
 True
 }}}


 '''__Conclusion__'''

 The timings show that with few exceptions the new implementation is faster
 than the generic implementation in Python's weakref module. This is not
 much
 of a surprise, since weakref is written in pure Python, while my
 implementation is in Cython.

 According to the above timings, only the iteration over the keys of the
 dictionary are considerably faster in the old implementation. This is not
 much
 of a surprise, since in this task the old implementation can fully rely on
 iteration over a Python dict, whereas in the new implementation one needs
 more
 work. I will try to make this faster, though.

 The new implementation behaves well in the case of hash collisions. This
 actually is a surprise for me, and in order to be on the safe side, it
 would
 make sense to add some consistency tests.

 Other than that, the new implementation is not slower but much safer than
 the
 old implementation. We should use it---unless the crash you are getting is
 reproducible.

--
Ticket URL: <http://trac.sagemath.org/ticket/13394#comment:29>
Sage <http://www.sagemath.org>
Sage: Creating a Viable Open Source Alternative to Magma, Maple, Mathematica, 
and MATLAB

-- 
You received this message because you are subscribed to the Google Groups 
"sage-trac" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sage-trac.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to