#15956: WeakValueDictionary does not handle unhashable keys correctly
-------------------------------------+-------------------------------------
Reporter: saraedum | Owner:
Type: defect | Status: needs_review
Priority: minor | Milestone: sage-6.2
Component: misc | Resolution:
Keywords: hash | Merged in:
Authors: | Reviewers:
Report Upstream: N/A | Work issues:
Branch: | Commit:
u/saraedum/ticket/15956 | 3a05a1aec6459e59f0b7069af4bc1100a71a9502
Dependencies: | Stopgaps:
-------------------------------------+-------------------------------------
Changes (by nbruin):
* commit: => 3a05a1aec6459e59f0b7069af4bc1100a71a9502
Comment:
The proposed fix is very elegant and minimal, but it does noticeably
affect a code path that deserves to be very fast: testing that a key does
''not'' occur in the dictionary. Timings:
{{{
%cpaste
cython("""
def test(N,D):
cdef long n=N
for i in range(n):
1 in D
""")
--
}}}
{{{
sage: D=sage.misc.weak_dict.WeakValueDictionary()
sage: D[2]=ZZ
sage: timeit("test(10^7,D)")
5 loops, best of 3: 122 ms per loop
//with the extra hash call:
5 loops, best of 3: 151 ms per loop
sage: D[1]=ZZ
sage: timeit("test(10^7,D)")
5 loops, best of 3: 466 ms per loop
//this of course times the same with the extra hash call
}}}
Compared with a normal dictionary:
{{{
sage: N={}
sage: N[2]=ZZ
sage: timeit("test(10^7,N)")
5 loops, best of 3: 90.6 ms per loop
sage: N[1]=ZZ
sage: timeit("test(10^7,N)")
5 loops, best of 3: 377 ms per loop
}}}
The change from 122ms to 151ms is an increase of 25% in time taken (and
this is for integers, which hash extremely quickly. I've also tried it
with a string key '1', where I got 101ms vs. 127ms, so a penalty of around
30ms/10^7 seems to be expected).
Of course, it's nice if `sage.misc.weak_dict.WeakValueDictionary` is a
perfect drop-in replacement for `weakref.WeakValueDictionary`, but having
better performance at the expense of different edge-case error reporting
(and it's just reporting "key not here" in cases where that's actually
true, and where some other dictionaries raise "key not hashable", so it's
hard to think of an edge case where that seriously affects legitimate
program logic) shouldn't just be dismissed.
We could of course patch python and backport `PyDict_GetItemWithError` and
get a proper fix, without performance degradation.
For the current proposal, I'd like to see some real-world motivation to
bring the error reporting in line with other dictionaries before I'd
consider significantly degrading performance.
A further problem with the current proposal: it only fixes one symptom.
The nastier example
[http://hg.python.org/cpython/file/5cab0ada97b2/Objects/dictobject.c#l1046]
(stack overflow during key comparison) isn't addressed by verifying that
the key hashes. Going all the way with the `PyDict_GetItemWithError`
solution (or equivalent) would also solve that.
----
New commits:
||[http://git.sagemath.org/sage.git/commit/?id=3a05a1aec6459e59f0b7069af4bc1100a71a9502
3a05a1a]||{{{Improved handling of unhashable keys in weak dictionary}}}||
--
Ticket URL: <http://trac.sagemath.org/ticket/15956#comment:5>
Sage <http://www.sagemath.org>
Sage: Creating a Viable Open Source Alternative to Magma, Maple, Mathematica,
and MATLAB
--
You received this message because you are subscribed to the Google Groups
"sage-trac" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To post to this group, send email to [email protected].
Visit this group at http://groups.google.com/group/sage-trac.
For more options, visit https://groups.google.com/d/optout.