#13394: Write a WeakValueDictionary with safer key removal
-------------------------------------+-------------------------------------
       Reporter:  nbruin             |        Owner:  rlm
           Type:  enhancement        |       Status:  needs_review
       Priority:  major              |    Milestone:  sage-5.13
      Component:  memleak            |   Resolution:
       Keywords:                     |    Merged in:
        Authors:  Simon King         |    Reviewers:
Report Upstream:  None of the above  |  Work issues:
  - read trac for reasoning.         |       Commit:
         Branch:                     |  fab0ed4112b9f798e2690f4c885b57cd711ea698
  u/SimonKing/ticket/13394           |     Stopgaps:
   Dependencies:                     |
-------------------------------------+-------------------------------------

Comment (by nbruin):

 Replying to [comment:52 SimonKing]:
 > {{{
 > while i<len(bucket):
 >     i += 1
 >     ...
 > }}}
 > That's slower, though.

 No, that's still not safe (apart from the fact you'd do `i+=2`) . If i=2,
 you could have a `del bucket[0:2]` happen and if the bucket was long
 enough, you'll now be skipping a key-value pair. That could be the pair
 you were supposed to find! You might be able to get by with placing "not
 in use" sentinels in the bucket slots you're not using anymore, instead of
 properly deleting them. That's what python does (see below), but then
 cleaning/reusing that garbage comes natural in python's scheme.

 If your dictionary changes, you have to start over. Note that this isn't
 an issue in `TripleDict` etc., nor is it for exclusively string-keyed
 dictionaries in python, because then you can guarantee the dict won't
 change as a result of your equality tests.

 To see the kind of care necessary (and taken in `dict`!):

 {{{
 D={}
 class EvilKey(object):
     def __init__(self,nr):
         self.evil = False
         self.nr = nr
     def __hash__(self):
         return 0
     def __repr__(self):
         return "Evil key nr. %s"%self.nr
     def __eq__(self,other):
         if self.evil:
             del D[0]
         return self is other

 k1=EvilKey(1)
 k2=EvilKey(2)
 D[0]='a'
 D[k1]='b'
 k1.evil=True
 D[k2]='c'
 k1.evil=False
 D[k1]='d'
 }}}
 In this script, `D` does not get reshaped, so python does not "start over"
 at any point. However, a side effect of `D[k2]='c'` is that an earlier
 slot (previously taken by `0`) becomes available again. This slot is
 already passed, in the probing for finding a spot for `D[k2]`, so its
 value does not end up there. This seems dangerous for the `D[k1]='d'`
 assignment: a free slot would be encountered before `k1` is found to
 already be a key. `dict` doesn't get fooled, though:
 the slot previously occupied for `D[0]` is marked as "freed", not as
 "never used".

 So, python continues probing the entire bucket (for as far as you can talk
 about buckets in an open-coded hash table) and indeed finds the key `k1`
 is already in there and uses that entry. If it would have found that the
 key `k1` didn't occur yet, it would have reused the freed slot it found
 earlier.

 I'd say this is all a pretty good argument to try and stick with python's
 `dict`. Coding a bullet-proof, high performance hash table is not an easy
 task.

--
Ticket URL: <http://trac.sagemath.org/ticket/13394#comment:53>
Sage <http://www.sagemath.org>
Sage: Creating a Viable Open Source Alternative to Magma, Maple, Mathematica, 
and MATLAB

-- 
You received this message because you are subscribed to the Google Groups 
"sage-trac" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sage-trac.
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to