#13394: Write a WeakValueDictionary with safer key removal
-------------------------------------+-------------------------------------
Reporter: nbruin | Owner: rlm
Type: enhancement | Status: needs_review
Priority: major | Milestone: sage-5.13
Component: memleak | Resolution:
Keywords: | Merged in:
Authors: Simon King | Reviewers:
Report Upstream: None of the above | Work issues:
- read trac for reasoning. | Commit:
Branch: | fab0ed4112b9f798e2690f4c885b57cd711ea698
u/SimonKing/ticket/13394 | Stopgaps:
Dependencies: |
-------------------------------------+-------------------------------------
Comment (by nbruin):
Replying to [comment:52 SimonKing]:
> {{{
> while i<len(bucket):
> i += 1
> ...
> }}}
> That's slower, though.
No, that's still not safe (apart from the fact you'd do `i+=2`) . If i=2,
you could have a `del bucket[0:2]` happen and if the bucket was long
enough, you'll now be skipping a key-value pair. That could be the pair
you were supposed to find! You might be able to get by with placing "not
in use" sentinels in the bucket slots you're not using anymore, instead of
properly deleting them. That's what python does (see below), but then
cleaning/reusing that garbage comes natural in python's scheme.
If your dictionary changes, you have to start over. Note that this isn't
an issue in `TripleDict` etc., nor is it for exclusively string-keyed
dictionaries in python, because then you can guarantee the dict won't
change as a result of your equality tests.
To see the kind of care necessary (and taken in `dict`!):
{{{
D={}
class EvilKey(object):
def __init__(self,nr):
self.evil = False
self.nr = nr
def __hash__(self):
return 0
def __repr__(self):
return "Evil key nr. %s"%self.nr
def __eq__(self,other):
if self.evil:
del D[0]
return self is other
k1=EvilKey(1)
k2=EvilKey(2)
D[0]='a'
D[k1]='b'
k1.evil=True
D[k2]='c'
k1.evil=False
D[k1]='d'
}}}
In this script, `D` does not get reshaped, so python does not "start over"
at any point. However, a side effect of `D[k2]='c'` is that an earlier
slot (previously taken by `0`) becomes available again. This slot is
already passed, in the probing for finding a spot for `D[k2]`, so its
value does not end up there. This seems dangerous for the `D[k1]='d'`
assignment: a free slot would be encountered before `k1` is found to
already be a key. `dict` doesn't get fooled, though:
the slot previously occupied for `D[0]` is marked as "freed", not as
"never used".
So, python continues probing the entire bucket (for as far as you can talk
about buckets in an open-coded hash table) and indeed finds the key `k1`
is already in there and uses that entry. If it would have found that the
key `k1` didn't occur yet, it would have reused the freed slot it found
earlier.
I'd say this is all a pretty good argument to try and stick with python's
`dict`. Coding a bullet-proof, high performance hash table is not an easy
task.
--
Ticket URL: <http://trac.sagemath.org/ticket/13394#comment:53>
Sage <http://www.sagemath.org>
Sage: Creating a Viable Open Source Alternative to Magma, Maple, Mathematica,
and MATLAB
--
You received this message because you are subscribed to the Google Groups
"sage-trac" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sage-trac.
For more options, visit https://groups.google.com/groups/opt_out.