Here are the original PyBench results from when we switched to the new
dictionary implementation back in February of last year:
Test minimum run-time average run-time
this other diff this other diff
-------------------------------------------------------------------------------
DictWithFloatKeys: 156ms 217ms -28.1% 159ms 223ms -28.9%
DictWithIntegerKeys: 105ms 198ms -47.0% 107ms 200ms -46.6%
DictWithStringKeys: 159ms 253ms -37.2% 161ms 256ms -37.1%
SimpleDictManipulation: 108ms 176ms -38.7% 110ms 180ms -38.7%
(great timing, in 2 weeks the e-mail that had this data would have self
destructed).
"this" is the new implementation, and "other" was the old implementation based
upon System.Collections.Generic.Dictionary<object, object> where we just locked
on all operations. The implementation has undergone frequent tweaks to get it
tuned to a more broad range of scenarios since then but I believe the PyBench
numbers have held within 10% or so. The one possible exception is float keys
which will be much faster in 2.0.1 and in the current sources - although I
don't know why you'd ever hash on floats :).
The big problem with the BCL dictionary is of course that it's not thread safe.
But we also solved another problem at the same time in that we used to have
many different types that would pretend to by Python's dictionary type but
weren't really. So type({}) is type(someOtherObjectYouExpectedToReallyBeADict)
was frequently false - but type({}) == type(...) would still be true due to
this yucky type impersonation feature we had going on. And we've gained a few
other optimizations from having our own dictionary type - for example we can
now quickly check whether or not a dictionary contains string keys for doing
keyword splatting (an idea which even recently came up on python-dev). So this
wasn't entirely just about speeding it up via thread safety lock-free reads.
As for the individual options I'll add a 4th one to your list - a copy on write
dictionary. But I didn't investigate either COW or a reader-writer lock
implementation. COW I didn't look at just because it seems likely that we'll
be dealing with big dictionaries frequently enough that it is likely to lose.
Maybe a hybrid COW/lock free reads will eventually make sense but will also be
much more complex. I didn't look at reader/writer lock simply because I've
looked at using reader/writer locks in the past (I believe I was looking at it
for our list implementation). .NET's basic ReaderWriterLock class has a lot of
overhead when compared to just doing simple operations like a dictionary
read/write. It's really meant to be used when there's a lot of work being done
under the lock. In 3.0 they added ReaderWriterLockSlim which performs better
but requires full trust! Not that we can depend on System.Core yet anyway...
But I'd suspect that needing to take the lock on every r
ead will likely lose - and is only going to become a worse proposition over
time as we get into a more and more multi-core world.
My biggest take away is that I almost wish we made dict/list not thread safe in
IronPython and said that you had to lock if you planned on using them
concurrently. But that's probably impractical as people probably want modules
to not being corrupted under multiple readers/writers :) Ahh, if we only
implemented from __future__ import GIL :)
-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Dan Eloff
Sent: Wednesday, February 11, 2009 1:56 PM
To: Discussion of IronPython
Subject: [IronPython] Question about dictionaries in general
Hi,
This is not an IronPython question really, but I notice the dictionary
architecture used by IronPython allows lock free read access. With
hashtables being so important to dynamic languages this has always
been a subject of interest to me. Did you do any comparisons with the
System.Collections.Generic dictionary? There seems to be three basic
strategies for making a thread-safe dictionary, dictionary with simple
locking on reads & writes, reader-writer lock, and lock free reads. I
would expect that the latter approach performs best, but it can depend
on the ratio of reads to writes and how much contention there is. I'd
be most interested to hear about the experiences of the IronPython
developers on this subject.
Thanks,
-Dan
_______________________________________________
Users mailing list
[email protected]
http://lists.ironpython.com/listinfo.cgi/users-ironpython.com
_______________________________________________
Users mailing list
[email protected]
http://lists.ironpython.com/listinfo.cgi/users-ironpython.com